Crash Recovery. Assignment 1 Posted Saturday
|
|
- Morgan Austin
- 5 years ago
- Views:
Transcription
1 Crash Recovery Wyatt Lloyd Assignment 1 Posted Saturday On github, instructions in readme.md: Posted later than I intended => You get lots of late days Please start ASAP Let me know of any issues with: Environment Version of go later than 1.2
2 Assignment Late Days 8 late days for the semester Use on any assignment (Save for later, harder assignments) Use in 1 day increment e.g., 1 second late = 1 hour late = 1 day late Based on last anhandin annotated tag time Assignment 1 Progress
3 Paper Presentations Due 4 days early (new, was 1 week ) i.e., Thursday at 11:59:59pm for Tuesday Or Saturday at 11:59:59pm for Thursday 3 things due: The required paper summary for that day Supplemental paper summary Slides Slides them to Bailan and me in powerpoint compatible format (google slides are fine) Local Fault Tolerance Distributed Systems Building block for building reliable distributed systems
4 Local Faults Power crash fault Focus of today Lose power, regain power, want to keep working Kernel panic Common Corruption (bit flips) Cosmic radiation, Use Error Correcting Codes (ECC) The norm in datacenters Power Faults What happens when you pull the plug on a server? Disk state maintained Hard drive SSD Nonvolatile Memory state lost DRAM Volatile Future memory will not be lost NVRAM Research: Get to rethink system fault tolerance
5 Aside: Memory Not Lost Immediately Lest we remember: cold-boot attacks on encryption keys Alex J Halderman et al. (Princeton) Usenix Security 08 Contrary to popular assumption, DRAMs used in most modern computers retain their contents for seconds to minutes after power is lost, even at operating temperatures and even if removed from a motherboard. Enables attackers to steal encryption keys from memory Aside 5s" " " " 30s " " "60s " " " 5min
6 Crash Faults Reasonable assumptions for us: State saved to disk is still there In memory state is gone Synchronous Logging Essentially use the disk as our memory Wait for synchronous disk write before continuing intfd=open( journal.log,o_rdwr O_SYNC); Why not do this? Disk are *very* slow Fundamental tension
7 Disk Drive Performance Primer Random reads/writes In place updates (e.g., O_SYNC) Seek time + rotational latency ~10ms + ~5ms ~80 IOPS/drive from f4 paper Sequential reads/writes Read/write to contiguous blocks Much faster ( MBps) Todo: Experiment to see read/write latency for different block sizes Q: Why is this interesting? Takeaway: Random slow for disks Sequential fast for disks If writes are big enough Aside: SSD Performance Primer All reads are fast and have high throughput No disk head to seek or disk to rotate Randoms writes are still slow and have low throughput Eventually (once SSD is full ) Also due to how SSDs physically work Must erase block together on flash chips Many parallel flash chips in an SSD Max throughput require 256 MiB or 512 MiB writes in 3 modern SSDs (RIPQ paper) Sequential writes are still fast and have high throughput Higher than disks, e.g., 600MBps vs 200MBps
8 Write-Ahead Logging Store everything that matters to disk before we do it LOG: will do Zahaib.status = Presenting today FILE: Zahaib.status = Presenting today LOG: did Zahaib.status = Presenting today Typically use a dedicated disk Much faster, but still rotational latency Recovery Replay log Wait for replay to complete before continuing Updates should be idempotent i.e., Zahaib.friends = 500, not += 1 Remaining issues? Slow recovery Atomicity
9 Speeding Up Recovery How can we make recovery faster? Remove completed prefix of log i.e., part of log where every will do has a matching did All or nothing Maintains invariants Banking, money transfer: Minlan.account -= $100; Ethan.account += $100; Atomicity Social network, friend addition: Minlan.friends += Ethan; Ethan.friends += Minlan; Filesystem, rename: Create new directory entry Erase old directory entry
10 Atomicity & Logging Write-Ahead Logging Will do transaction 1 Minlan.friends += Ethan Ethan.friends += Minlan Did transaction 1 Now actually do updates Atomicity & Recovery Write-Ahead Logging Will do transaction 1 Minlan.friends += Ethan Ethan.friends += Minlan Did transaction 1 Now actually do updates What happens when failure happens Did transaction identifies commit
11 Pretty Simple Right? Unless things you thought were atomic aren t actually Unless things you didn t think were written to disk were Unless you also get a failure during recovery Aries Write-Ahead Logging for Databases Used in many commercial DBs 1992 Transactions on Database Systems C. Mohan et al. (IBM Research) Considered Gold Standard More complicated that we ve discussed Failure during recovery Aborted transactions Only commits transactions that really commmitted
12 Recent Research From ARIES to MARS: Transaction Support for Next-Generation, Solid-State Drives Joel Coburn, Trevor Bunker, Rajesh K Gupta and Steven Swanson (UCSD) SOSP 13 Write-ahead logging scheme for non-volatile memory e.g., phase-change memory, spin-transfer torque MRAMs, and the memristor WAL without restricting to append-only logs Should Real Systems Care? Yes (Battery backups don t stop kernel panics)
13 Should Research Prototypes Care? No? Not the focus of most prototypes Can be done properly Yes? Can affect design and/or results Improves accuracy of results Could become a real system Takeaways Lecture: Use write-ahead logging Papers: Very hard to handle local faults properly Critical for real systems Debatably important for research prototypes Distributed fault tolerance is even harder Assuming failing nodes fit a specific fault model
14 Intermission EXPLODE Jamie Tsao
15 EXPLODE A Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, and Dawson Engler OSDI 2006 (also BUGS 2005 workshop) Torturing Storage Systems... Must recover correctly from crashes at any point in program. Modifications, flushing Testing methods currently terrible Manual inspection Bug reports from angry users Power cord yanking to simulate power failures Unit testing of undocumented kernel methods? Uses model checking, but in situ, running live systems mounted from a lightweight device driver in a stock kernel Alternatives includes running a third protection ring inside, partial system checks, or modeling
16 ...and Torturing Databases The new paper in class is a bit different Based mostly on database reliability issues ACID, versus just any storage system invariant Uses power faults as fault model EXPLODE creates corruptions before propagation to disk EXPLODE more OS-and-below level issues, while this one is on higher-level software Model-checking in EXPLODE Set up with storage component init(), mount(), unmount(), recover(), threads() Checking by exploring choices mutate() calls choose(n) to branch off to different possible states from calling system-specific methods Calls check_crash_*() to create crash disk image Permutes over possible write sets Calls check() to verify condition, logging error cases Uses scheduler to pick states and get_sig() to eliminate duplicates Checkpoint states and rerun them deterministically from choice sequence Control threads to eliminate non-deterministic behavior All of this runs on some extra RAM disk, with EKM in a modified kernel
17 Different check on databases Does not force deterministic thread scheduling Allows finding of concurrent bugs in databases Includes suggestion for workload and tracing of errors when looking for bugs EXPLODE makes users create situations to test themselves Record/replay like logging and checkpoints/states in EXPLODE Pattern-based ranking for finding most problematic areas Exhaustive fault injection policy similar to model-checking EXPLODE cannot run on Windows databases EXPLODE implemented on Linux kernel Intercept on iscsi layer so can run on any OS Black box, white box? EXPLODE's Results A lot of bugs found (36 in total). This was just from writing a little bit of code
18 ...and the other results A bit hard to compare (FS vs database) ext3 and XFS for Linux to check for FS failure Also did some analysis of pattern-based ranking for vulnerability (EXPLODE doesn't have this Found concurrency problems Durability most prevalent (7 of 8 databases, last one hanged) Like sync'ing, committing not persistent after recovery Also note that all of these databases have issues, despite extensive testing Take-away Model-checking: Expanding all possible states and checking all choices from them Corner case as easy to find as common case This is useful for doing some interesting state space searches Combine systems together to check Sum of parts different from whole All those files systems have bugs A bug-free system is somehow surprising?
19 Alice & Bob Zahaib Aktar All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications Thanumalayan Sankaranarayana Pillai, Vijay Chidambram, Ramnathan Algappan, Samer Al-Kiswany, Andrea C. Arpaci-Dusseau and Remzi H. Arpaci-Dusseau (University of Wisconsin Madison) OSDI "
20 Problem at Hand Crash recovery is essential but hard to get right o Reason: applications are built atop unreliable Filesystems What makes Filesystems unreliable: o Filesystem guarantees are unclear " Disk state mutation in case of crashes is largely non-deterministic " Different FSs such as Linux ext3 and ext4 have different robustness o Building high performance crash consistency protocols is hard " Non-deterministic state => large number of corner cases " Application level crash consistency protocols are big and complex 1" Comparison with Torturing DBs How is the problem different? o Overall the two papers are quite complementary o Required paper considers a specific failure instance: power loss o Check ACID consistency, A&B more general Focus slightly different o Req paper focuses on applications (DBs) inability in face of crash o Alice and Bob more general, covers a range of Apps (Hadoop, DBs) o Alice and Bob examine the shortcomings of the underlying FS Similarities o Similar in overall goal, expose bugs in case of crash o Technique quite similar, both acknowledge the short comings of FS 2"
21 Techniques and Insights What FS behaviour is necessary for building crash consistent apps? o Persistence properties Are modern application-level crash consistency protocols correct? Propose BOB: Block Order Breaker o Reorder block traces and find sequences which break consistency Propose Alice: Application-Level Intelligent Crash Explorer o Application updates are a series of sys calls (e.g. append, write) o Permute sys call workloads and analyze permutations o Finds out persistence properties assumed by applications 1" Comparison with Torturing DBs How does this build upon the required paper? o A&B present a much more general framework for analyzing crashes o While req. paper looks at apps, A&B looks at both FS and apps Techniques o Pretty similar ideas Both collect block level traces o Req. paper injects failures at different points and check consistency o A&B permutes different block orderings and checks consistency Study depth o While req. paper acknowledges that FS can cause problems o A&B empirically demonstrate and quantify the extent of the problem 4"
22 Key Findings BOB studied six different Linux filesystems o Persistence properties vary both between and within FileSystem App level consistency dependent on underlying FS persistence properties o This dependency is undesirable: its a crash vulnerability o Finds a total of 60 vulnerabilities across 11 apps such as SQLite, Git, HDFS Many apps expect ordering among sys calls o When ordering broken: 7/11 apps do not recover from crashes o 10/11 apps also expect atomicity of filesystem updates " not so bad, 512 byte (a disk sector) writes/rename ops. are mostly atomic " but may break in the future with smaller sectors o 7/11 apps do not meet durability guarantees 1" Comparison with Tortuting DBs Developer Assumptions: o Req. paper identifies 5 low-level vulnerability patterns o establishes developer ignorance/wrong assumptions o A&B also finds wrong develop assumptions major cause of failure " A&B also puts blame on ambiguous FS specifications Results o Req. paper finds that 7/8 DBs violate atomicity constraints o Similar findings by A&B w.r.t appends o Both papers reveal a great extent of vulnerabilities in target systems o A&B: single vulnerability in PostgreSQL and LMDB already known (validation) o A&B: 31/60 vulnerabilities violate a user expectation and not a documented spec 6"
23 Things to remember Years of research for filesystem consistency, but o Techniques like logging, copy-on-write and similar approaches fall short o Plenty of bugs still remain App developers need to be careful on following accounts o Must not assume FS guarantees o Different FS vary greatly, must make apps independent of different FS Alice and Bob: but not your everyday Alice and Bob o Bob analyzes block level traces and finds persistence property violation o Alice permutes sys calls and analyzes persistence properties assumed by apps 7"
Towards Efficient, Portable Application-Level Consistency
Towards Efficient, Portable Application-Level Consistency Thanumalayan Sankaranarayana Pillai, Vijay Chidambaram, Joo-Young Hwang, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau 1 File System Crash
More informationTopics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability
Topics COS 318: Operating Systems File Performance and Reliability File buffer cache Disk failure and recovery tools Consistent updates Transactions and logging 2 File Buffer Cache for Performance What
More informationAll File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications
All File Systems Are Not Created Equal: On the Compleity of Crafting Crash-Consistent Applications Thanumalayan Sankaranarayana Pillai Vijay Chidambaram Ramnatthan Alagappan, Samer Al-Kiswany Andrea Arpaci-Dusseau,
More informationCSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101
CSC 261/461 Database Systems Lecture 20 Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101 Announcements Project 1 Milestone 3: Due tonight Project 2 Part 2 (Optional): Due on: 04/08 Project 3
More informationEXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University
EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors Junfeng Yang, Can Sar, Dawson Engler Stanford University Why check storage systems? Storage system errors are among the
More informationAnnouncements. Persistence: Crash Consistency
Announcements P4 graded: In Learn@UW by end of day P5: Available - File systems Can work on both parts with project partner Fill out form BEFORE tomorrow (WED) morning for match Watch videos; discussion
More informationCS122 Lecture 15 Winter Term,
CS122 Lecture 15 Winter Term, 2017-2018 2 Transaction Processing Last time, introduced transaction processing ACID properties: Atomicity, consistency, isolation, durability Began talking about implementing
More informationThe transaction. Defining properties of transactions. Failures in complex systems propagate. Concurrency Control, Locking, and Recovery
Failures in complex systems propagate Concurrency Control, Locking, and Recovery COS 418: Distributed Systems Lecture 17 Say one bit in a DRAM fails: flips a bit in a kernel memory write causes a kernel
More information*-Box (star-box) Towards Reliability and Consistency in Dropbox-like File Synchronization Services
*-Box (star-box) Towards Reliability and Consistency in -like File Synchronization Services Yupu Zhang, Chris Dragga, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau University of Wisconsin - Madison 6/27/2013
More informationJOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26
JOURNALING FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 26 2 File System Robustness The operating system keeps a cache of filesystem data Secondary storage devices are much slower than
More informationLecture 10: Crash Recovery, Logging
6.828 2011 Lecture 10: Crash Recovery, Logging what is crash recovery? you're writing the file system then the power fails you reboot is your file system still useable? the main problem: crash during multi-step
More informationLast Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications
Last Class Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications Basic Timestamp Ordering Optimistic Concurrency Control Multi-Version Concurrency Control C. Faloutsos A. Pavlo Lecture#23:
More informationCS3210: Crash consistency
CS3210: Crash consistency Kyle Harrigan 1 / 45 Administrivia Lab 4 Part C Due Tomorrow Quiz #2. Lab3-4, Ch 3-6 (read "xv6 book") Open laptop/book, no Internet 2 / 45 Summary of cs3210 Power-on -> BIOS
More informationAdvanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)
: LFS and Soft Updates Ken Birman (based on slides by Ben Atkin) Overview of talk Unix Fast File System Log-Structured System Soft Updates Conclusions 2 The Unix Fast File System Berkeley Unix (4.2BSD)
More informationFS Consistency & Journaling
FS Consistency & Journaling Nima Honarmand (Based on slides by Prof. Andrea Arpaci-Dusseau) Why Is Consistency Challenging? File system may perform several disk writes to serve a single request Caching
More informationPERSISTENCE: FSCK, JOURNALING. Shivaram Venkataraman CS 537, Spring 2019
PERSISTENCE: FSCK, JOURNALING Shivaram Venkataraman CS 537, Spring 2019 ADMINISTRIVIA Project 4b: Due today! Project 5: Out by tomorrow Discussion this week: Project 5 AGENDA / LEARNING OUTCOMES How does
More informationCrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin
CrashMonkey: A Framework to Systematically Test File-System Crash Consistency Ashlie Martinez Vijay Chidambaram University of Texas at Austin Crash Consistency File-system updates change multiple blocks
More informationCS 318 Principles of Operating Systems
CS 318 Principles of Operating Systems Fall 2017 Lecture 17: File System Crash Consistency Ryan Huang Administrivia Lab 3 deadline Thursday Nov 9 th 11:59pm Thursday class cancelled, work on the lab Some
More informationTWO-PHASE COMMIT ATTRIBUTION 5/11/2018. George Porter May 9 and 11, 2018
TWO-PHASE COMMIT George Porter May 9 and 11, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons license These slides
More informationEECS 482 Introduction to Operating Systems
EECS 482 Introduction to Operating Systems Winter 2018 Harsha V. Madhyastha Multiple updates and reliability Data must survive crashes and power outages Assume: update of one block atomic and durable Challenge:
More informationCrash Consistency: FSCK and Journaling. Dongkun Shin, SKKU
Crash Consistency: FSCK and Journaling 1 Crash-consistency problem File system data structures must persist stored on HDD/SSD despite power loss or system crash Crash-consistency problem The system may
More informationOperating Systems. File Systems. Thomas Ropars.
1 Operating Systems File Systems Thomas Ropars thomas.ropars@univ-grenoble-alpes.fr 2017 2 References The content of these lectures is inspired by: The lecture notes of Prof. David Mazières. Operating
More informationTransactions. 1. Transactions. Goals for this lecture. Today s Lecture
Goals for this lecture Transactions Transactions are a programming abstraction that enables the DBMS to handle recovery and concurrency for users. Application: Transactions are critical for users Even
More informationTorturing Databases for Fun and Profit
Torturing Databases for Fun and Profit Mai Zheng, Joseph Tucek, Dachuan Huang, Feng Qin, Mark Lillibridge Elizabeth S Yang, Bill W Zhao, Shashank Singh The Ohio State University HP Labs 2 database 3 ACID:
More informationLectures 8 & 9. Lectures 7 & 8: Transactions
Lectures 8 & 9 Lectures 7 & 8: Transactions Lectures 7 & 8 Goals for this pair of lectures Transactions are a programming abstraction that enables the DBMS to handle recoveryand concurrency for users.
More informationCS5460: Operating Systems Lecture 20: File System Reliability
CS5460: Operating Systems Lecture 20: File System Reliability File System Optimizations Modern Historic Technique Disk buffer cache Aggregated disk I/O Prefetching Disk head scheduling Disk interleaving
More informationLecture 18: Reliable Storage
CS 422/522 Design & Implementation of Operating Systems Lecture 18: Reliable Storage Zhong Shao Dept. of Computer Science Yale University Acknowledgement: some slides are taken from previous versions of
More informationOptimistic Crash Consistency. Vijay Chidambaram Thanumalayan Sankaranarayana Pillai Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau
Optimistic Crash Consistency Vijay Chidambaram Thanumalayan Sankaranarayana Pillai Andrea Arpaci-Dusseau Remzi Arpaci-Dusseau Crash Consistency Problem Single file-system operation updates multiple on-disk
More informationGFS: The Google File System
GFS: The Google File System Brad Karp UCL Computer Science CS GZ03 / M030 24 th October 2014 Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one
More informationLecture 11: Linux ext3 crash recovery
6.828 2011 Lecture 11: Linux ext3 crash recovery topic crash recovery crash may interrupt a multi-disk-write operation leave file system in an unuseable state most common solution: logging last lecture:
More information[537] Journaling. Tyler Harter
[537] Journaling Tyler Harter FFS Review Problem 1 What structs must be updated in addition to the data block itself? [worksheet] Problem 1 What structs must be updated in addition to the data block itself?
More informationJournaling. CS 161: Lecture 14 4/4/17
Journaling CS 161: Lecture 14 4/4/17 In The Last Episode... FFS uses fsck to ensure that the file system is usable after a crash fsck makes a series of passes through the file system to ensure that metadata
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class. Today s Class. Faloutsos/Pavlo CMU /615
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 1 (R&G ch. 18) Last Class Basic Timestamp Ordering Optimistic Concurrency
More informationGFS: The Google File System. Dr. Yingwu Zhu
GFS: The Google File System Dr. Yingwu Zhu Motivating Application: Google Crawl the whole web Store it all on one big disk Process users searches on one big CPU More storage, CPU required than one PC can
More informationPush-button verification of Files Systems via Crash Refinement
Push-button verification of Files Systems via Crash Refinement Verification Primer Behavioral Specification and implementation are both programs Equivalence check proves the functional correctness Hoare
More informationExt3/4 file systems. Don Porter CSE 506
Ext3/4 file systems Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Today s Lecture Kernel RCU File System Networking Sync Memory Management Device Drivers
More informationLecture 21: Logging Schemes /645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo
Lecture 21: Logging Schemes 15-445/645 Database Systems (Fall 2017) Carnegie Mellon University Prof. Andy Pavlo Crash Recovery Recovery algorithms are techniques to ensure database consistency, transaction
More informationNPTEL Course Jan K. Gopinath Indian Institute of Science
Storage Systems NPTEL Course Jan 2012 (Lecture 39) K. Gopinath Indian Institute of Science Google File System Non-Posix scalable distr file system for large distr dataintensive applications performance,
More informationName: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 20 April 2011 Spring 2011 Exam 2
CMU 18-746/15-746 Storage Systems 20 April 2011 Spring 2011 Exam 2 Instructions Name: There are four (4) questions on the exam. You may find questions that could have several answers and require an explanation
More informationFile Systems: FFS and LFS
File Systems: FFS and LFS A Fast File System for UNIX McKusick, Joy, Leffler, Fabry TOCS 1984 The Design and Implementation of a Log- Structured File System Rosenblum and Ousterhout SOSP 1991 Presented
More informationDatabase Management System
Database Management System Lecture 10 Recovery * Some materials adapted from R. Ramakrishnan, J. Gehrke and Shawn Bowers Basic Database Architecture Database Management System 2 Recovery Which ACID properties
More informationCaching and reliability
Caching and reliability Block cache Vs. Latency ~10 ns 1~ ms Access unit Byte (word) Sector Capacity Gigabytes Terabytes Price Expensive Cheap Caching disk contents in RAM Hit ratio h : probability of
More informationTo understand this, let's build a layered model from the bottom up. Layers include: device driver filesystem file
Disks_and_Layers Page 1 So what is a file? Tuesday, November 17, 2015 1:23 PM This is a difficult question. To understand this, let's build a layered model from the bottom up. Layers include: device driver
More informationFault-Tolerance I: Atomicity, logging, and recovery. COS 518: Advanced Computer Systems Lecture 3 Kyle Jamieson
Fault-Tolerance I: Atomicity, logging, and recovery COS 518: Advanced Computer Systems Lecture 3 Kyle Jamieson What is fault tolerance? Building reliable systems from unreliable components Three basic
More informationFile System Management
Lecture 8: Storage Management File System Management Contents Non volatile memory Tape, HDD, SSD Files & File System Interface Directories & their Organization File System Implementation Disk Space Allocation
More informationOperating Systems. Lecture File system implementation. Master of Computer Science PUF - Hồ Chí Minh 2016/2017
Operating Systems Lecture 7.2 - File system implementation Adrien Krähenbühl Master of Computer Science PUF - Hồ Chí Minh 2016/2017 Design FAT or indexed allocation? UFS, FFS & Ext2 Journaling with Ext3
More informationUser Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM
Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis
More informationPRIMARY-BACKUP REPLICATION
PRIMARY-BACKUP REPLICATION Primary Backup George Porter Nov 14, 2018 ATTRIBUTION These slides are released under an Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0) Creative Commons
More information<Insert Picture Here> Filesystem Features and Performance
Filesystem Features and Performance Chris Mason Filesystems XFS Well established and stable Highly scalable under many workloads Can be slower in metadata intensive workloads Often
More informationC 1. Recap. CSE 486/586 Distributed Systems Distributed File Systems. Traditional Distributed File Systems. Local File Systems.
Recap CSE 486/586 Distributed Systems Distributed File Systems Optimistic quorum Distributed transactions with replication One copy serializability Primary copy replication Read-one/write-all replication
More informationEIO: Error-handling is Occasionally Correct
EIO: Error-handling is Occasionally Correct Haryadi S. Gunawi, Cindy Rubio-González, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Ben Liblit University of Wisconsin Madison FAST 08 February 28, 2008
More informationStrata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson
A Cross Media File System Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson 1 Let s build a fast server NoSQL store, Database, File server, Mail server Requirements
More informationConsistency and Scalability
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2015) Consistency and Scalability Noah Mendelsohn Tufts University Email: noah@cs.tufts.edu Web: http://www.cs.tufts.edu/~noah Copyright 2015 Noah
More informationI/O CANNOT BE IGNORED
LECTURE 13 I/O I/O CANNOT BE IGNORED Assume a program requires 100 seconds, 90 seconds for main memory, 10 seconds for I/O. Assume main memory access improves by ~10% per year and I/O remains the same.
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Last Class. Faloutsos/Pavlo CMU /615
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#23: Crash Recovery Part 2 (R&G ch. 18) Administrivia HW8 is due Thurs April 24 th Faloutsos/Pavlo
More informationAnnouncements. Persistence: Log-Structured FS (LFS)
Announcements P4 graded: In Learn@UW; email 537-help@cs if problems P5: Available - File systems Can work on both parts with project partner Watch videos; discussion section Part a : file system checker
More informationBlock Device Scheduling. Don Porter CSE 506
Block Device Scheduling Don Porter CSE 506 Logical Diagram Binary Formats Memory Allocators System Calls Threads User Kernel RCU File System Networking Sync Memory Management Device Drivers CPU Scheduler
More informationBlock Device Scheduling
Logical Diagram Block Device Scheduling Don Porter CSE 506 Binary Formats RCU Memory Management File System Memory Allocators System Calls Device Drivers Interrupts Net Networking Threads Sync User Kernel
More informationNovember 9 th, 2015 Prof. John Kubiatowicz
CS162 Operating Systems and Systems Programming Lecture 20 Reliability, Transactions Distributed Systems November 9 th, 2015 Prof. John Kubiatowicz http://cs162.eecs.berkeley.edu Acknowledgments: Lecture
More informationDistributed Systems
15-440 Distributed Systems 11 - Fault Tolerance, Logging and Recovery Tuesday, Oct 2 nd, 2018 Logistics Updates P1 Part A checkpoint Part A due: Saturday 10/6 (6-week drop deadline 10/8) *Please WORK hard
More informationFILE SYSTEMS, PART 2. CS124 Operating Systems Fall , Lecture 24
FILE SYSTEMS, PART 2 CS124 Operating Systems Fall 2017-2018, Lecture 24 2 Last Time: File Systems Introduced the concept of file systems Explored several ways of managing the contents of files Contiguous
More informationTransactions. CS 475, Spring 2018 Concurrent & Distributed Systems
Transactions CS 475, Spring 2018 Concurrent & Distributed Systems Review: Transactions boolean transfermoney(person from, Person to, float amount){ if(from.balance >= amount) { from.balance = from.balance
More informationCSE 333 Lecture 9 - storage
CSE 333 Lecture 9 - storage Steve Gribble Department of Computer Science & Engineering University of Washington Administrivia Colin s away this week - Aryan will be covering his office hours (check the
More informationRecovery and Logging
Recovery and Logging Computer Science E-66 Harvard University David G. Sullivan, Ph.D. Review: ACID Properties A transaction has the following ACID properties: Atomicity: either all of its changes take
More informationDistributed Systems COMP 212. Revision 2 Othon Michail
Distributed Systems COMP 212 Revision 2 Othon Michail Synchronisation 2/55 How would Lamport s algorithm synchronise the clocks in the following scenario? 3/55 How would Lamport s algorithm synchronise
More informationFile System Consistency
File System Consistency Jinkyu Jeong (jinkyu@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu EEE3052: Introduction to Operating Systems, Fall 2017, Jinkyu Jeong (jinkyu@skku.edu)
More informationSecondary storage. CS 537 Lecture 11 Secondary Storage. Disk trends. Another trip down memory lane
Secondary storage CS 537 Lecture 11 Secondary Storage Michael Swift Secondary storage typically: is anything that is outside of primary memory does not permit direct execution of instructions or data retrieval
More informationTo Everyone... iii To Educators... v To Students... vi Acknowledgments... vii Final Words... ix References... x. 1 ADialogueontheBook 1
Contents To Everyone.............................. iii To Educators.............................. v To Students............................... vi Acknowledgments........................... vii Final Words..............................
More informationAdministrative Details. CS 140 Final Review Session. Pre-Midterm. Plan For Today. Disks + I/O. Pre-Midterm, cont.
Administrative Details CS 140 Final Review Session Final exam: 12:15-3:15pm, Thursday March 18, Skilling Aud (here) Questions about course material or the exam? Post to the newsgroup with Exam Question
More informationDatabase Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No.
Database Management System Prof. D. Janakiram Department of Computer Science & Engineering Indian Institute of Technology, Madras Lecture No. # 18 Transaction Processing and Database Manager In the previous
More informationFFS: The Fast File System -and- The Magical World of SSDs
FFS: The Fast File System -and- The Magical World of SSDs The Original, Not-Fast Unix Filesystem Disk Superblock Inodes Data Directory Name i-number Inode Metadata Direct ptr......... Indirect ptr 2-indirect
More informationCOS 318: Operating Systems. Journaling, NFS and WAFL
COS 318: Operating Systems Journaling, NFS and WAFL Jaswinder Pal Singh Computer Science Department Princeton University (http://www.cs.princeton.edu/courses/cos318/) Topics Journaling and LFS Network
More informationTopics in Computer System Performance and Reliability: Storage Systems!
CSC 2233: Topics in Computer System Performance and Reliability: Storage Systems! Note: some of the slides in today s lecture are borrowed from a course taught by Greg Ganger and Garth Gibson at Carnegie
More informationCS 326: Operating Systems. Lecture 1
CS 326: Operating Systems Lecture 1 Welcome to CS 326! Glad to have you all in class! Lecture Information: Time: T, Th 9:55 11:40am Lab: M 4:45 6:20pm Room: LS G12 Course website: http://www.cs.usfca.edu/~mmalensek/cs326
More informationReplication. Feb 10, 2016 CPSC 416
Replication Feb 10, 2016 CPSC 416 How d we get here? Failures & single systems; fault tolerance techniques added redundancy (ECC memory, RAID, etc.) Conceptually, ECC & RAID both put a master in front
More informationLecture 21: Reliable, High Performance Storage. CSC 469H1F Fall 2006 Angela Demke Brown
Lecture 21: Reliable, High Performance Storage CSC 469H1F Fall 2006 Angela Demke Brown 1 Review We ve looked at fault tolerance via server replication Continue operating with up to f failures Recovery
More informationRethink the Sync 황인중, 강윤지, 곽현호. Embedded Software Lab. Embedded Software Lab.
1 Rethink the Sync 황인중, 강윤지, 곽현호 Authors 2 USENIX Symposium on Operating System Design and Implementation (OSDI 06) System Structure Overview 3 User Level Application Layer Kernel Level Virtual File System
More informationFILE SYSTEMS. CS124 Operating Systems Winter , Lecture 23
FILE SYSTEMS CS124 Operating Systems Winter 2015-2016, Lecture 23 2 Persistent Storage All programs require some form of persistent storage that lasts beyond the lifetime of an individual process Most
More informationCS5200 Database Management Systems Fall 2017 Derbinsky. Recovery. Lecture 15. Recovery
Lecture 15 1 1. Issues and Models Transaction Properties Storage Hierarchy Failure Mode System Log CS5200 Database Management Systems Fall 2017 Derbinsky Outline 2. UNDO Logging (Quiescent) Checkpoints
More informationFile System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University
File System Consistency Jin-Soo Kim (jinsookim@skku.edu) Computer Systems Laboratory Sungkyunkwan University http://csl.skku.edu Crash Consistency File system may perform several disk writes to complete
More information416 Distributed Systems. Distributed File Systems 2 Jan 20, 2016
416 Distributed Systems Distributed File Systems 2 Jan 20, 2016 1 Outline Why Distributed File Systems? Basic mechanisms for building DFSs Using NFS and AFS as examples NFS: network file system AFS: andrew
More informationò Very reliable, best-of-breed traditional file system design ò Much like the JOS file system you are building now
Ext2 review Very reliable, best-of-breed traditional file system design Ext3/4 file systems Don Porter CSE 506 Much like the JOS file system you are building now Fixed location super blocks A few direct
More informationTxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions
TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions Yige Hu, Zhiting Zhu, Ian Neal, Youngjin Kwon, Tianyu Chen, Vijay Chidambaram, Emmett Witchel The University of Texas at Austin
More informationRemote Procedure Call (RPC) and Transparency
Remote Procedure Call (RPC) and Transparency Brad Karp UCL Computer Science CS GZ03 / M030 10 th October 2014 Transparency in Distributed Systems Programmers accustomed to writing code for a single box
More informationDistributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen
Distributed Systems Lec 9: Distributed File Systems NFS, AFS Slide acks: Dave Andersen (http://www.cs.cmu.edu/~dga/15-440/f10/lectures/08-distfs1.pdf) 1 VFS and FUSE Primer Some have asked for some background
More informationReminder from last time
Concurrent systems Lecture 7: Crash recovery, lock-free programming, and transactional memory DrRobert N. M. Watson 1 Reminder from last time History graphs; good (and bad) schedules Isolation vs. strict
More informationChapter 9: Log Manager
Handout #14 Chapter 9: Log Manager Overview Reading & Writing the Log Performance Optimizations Making the Log Stable Recovery CS346 - Transaction Processing Markus Breunig - 9 / 1 - Reading & Writing
More informationAdministrivia. CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Disks (cont.) Disks - review
Administrivia CMSC 411 Computer Systems Architecture Lecture 19 Storage Systems, cont. Homework #4 due Thursday answers posted soon after Exam #2 on Thursday, April 24 on memory hierarchy (Unit 4) and
More informationEECS 482 Introduction to Operating Systems
EECS 482 Introduction to Operating Systems Winter 2018 Baris Kasikci Slides by: Harsha V. Madhyastha OS Abstractions Applications Threads File system Virtual memory Operating System Next few lectures:
More informationName: Instructions. Problem 1 : Short answer. [48 points] CMU / Storage Systems 23 Feb 2011 Spring 2012 Exam 1
CMU 18-746/15-746 Storage Systems 23 Feb 2011 Spring 2012 Exam 1 Instructions Name: There are three (3) questions on the exam. You may find questions that could have several answers and require an explanation
More informationDB Goals. Concurrency Control & Recovery. Transactions. Std. example - Durability
DB Goals Concurrency Control & Recovery Haeder83: Theo Haerder, Andreas Reuter, ACM Computing Surveys, vol 15, no 4, Dec 1983. Concurrency Control: Individual users see consistent states Even though ops
More informationBoot Camp. Dave Eckhardt Bruce Maggs
Boot Camp Dave Eckhardt de0u@andrew.cmu.edu Bruce Maggs bmm@cs.cmu.edu 1 This Is a Hard Class Traditional hazards 410 letter grade one lower than other classes All other classes this semester: one grade
More informationCOS 318: Operating Systems. NSF, Snapshot, Dedup and Review
COS 318: Operating Systems NSF, Snapshot, Dedup and Review Topics! NFS! Case Study: NetApp File System! Deduplication storage system! Course review 2 Network File System! Sun introduced NFS v2 in early
More informationThe Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram
The Dangers and Complexities of SQLite Benchmarking Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram 2 3 Benchmarking SQLite is Non-trivial! Benchmarking complex systems in a repeatable fashion
More informationNOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System
NOVA-Fortis: A Fault-Tolerant Non- Volatile Main Memory File System Jian Andiry Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Andy Rudoff (Intel), Steven
More information416 Distributed Systems. Errors and Failures Feb 1, 2016
416 Distributed Systems Errors and Failures Feb 1, 2016 Types of Errors Hard errors: The component is dead. Soft errors: A signal or bit is wrong, but it doesn t mean the component must be faulty Note:
More informationThe Google File System
October 13, 2010 Based on: S. Ghemawat, H. Gobioff, and S.-T. Leung: The Google file system, in Proceedings ACM SOSP 2003, Lake George, NY, USA, October 2003. 1 Assumptions Interface Architecture Single
More informationDatabase Hardware Selection Guidelines
Database Hardware Selection Guidelines BRUCE MOMJIAN Database servers have hardware requirements different from other infrastructure software, specifically unique demands on I/O and memory. This presentation
More informationDistributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung
Distributed Systems Lec 10: Distributed File Systems GFS Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 1 Distributed File Systems NFS AFS GFS Some themes in these classes: Workload-oriented
More informationActions are never left partially executed. Actions leave the DB in a consistent state. Actions are not affected by other concurrent actions
Transaction Management Recovery (1) Review: ACID Properties Atomicity Actions are never left partially executed Consistency Actions leave the DB in a consistent state Isolation Actions are not affected
More information