OPERATING SYSTEM TRANSACTIONS

Similar documents
TxFS: Leveraging File-System Crash Consistency to Provide ACID Transactions

CS 380S. TOCTTOU Attacks. Don Porter. Some slides courtesy Vitaly Shmatikov and Emmett Witchel. slide 1

Operating System Transactions

From Crash Consistency to Transactions. Yige Hu Youngjin Kwon Vijay Chidambaram Emmett Witchel

Transaction Memory for Existing Programs Michael M. Swift Haris Volos, Andres Tack, Shan Lu, Adam Welc * University of Wisconsin-Madison, *Intel

Operating Systems Should Provide Transactions

ECE 550D Fundamentals of Computer Systems and Engineering. Fall 2017

The Dangers and Complexities of SQLite Benchmarking. Dhathri Purohith, Jayashree Mohan and Vijay Chidambaram

Virtual File System. Don Porter CSE 306

Caching and reliability

MEMBRANE: OPERATING SYSTEM SUPPORT FOR RESTARTABLE FILE SYSTEMS

Network File System (NFS)

Network File System (NFS)

COS 318: Operating Systems. Journaling, NFS and WAFL

Secure Architecture Principles

EXPLODE: a Lightweight, General System for Finding Serious Storage System Errors. Junfeng Yang, Can Sar, Dawson Engler Stanford University

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

GPUfs: Integrating a file system with GPUs

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Virtual File System. Don Porter CSE 506

Using Software Transactional Memory In Interrupt-Driven Systems

File System Consistency. Jin-Soo Kim Computer Systems Laboratory Sungkyunkwan University

3/7/18. Secure Coding. CYSE 411/AIT681 Secure Software Engineering. Race Conditions. Concurrency

<Insert Picture Here> Btrfs Filesystem

Birth of Optimistic Methods. On Optimistic Methods for Concurrency Control. Basic performance arg

ò Server can crash or be disconnected ò Client can crash or be disconnected ò How to coordinate multiple clients accessing same file?

File System Consistency

NFS. Don Porter CSE 506

Transactional Memory. How to do multiple things at once. Benjamin Engel Transactional Memory 1 / 28

The benefits and costs of writing a POSIX kernel in a high-level language

RCU. ò Dozens of supported file systems. ò Independent layer from backing storage. ò And, of course, networked file system support

Atomic Transac1ons. Atomic Transactions. Q1: What if network fails before deposit? Q2: What if sequence is interrupted by another sequence?

CYSE 411/AIT681 Secure Software Engineering Topic #13. Secure Coding: Race Conditions

Rethink the Sync 황인중, 강윤지, 곽현호. Embedded Software Lab. Embedded Software Lab.

Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution

University of Wisconsin-Madison

Chapter 11: File System Implementation. Objectives

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

RCU. ò Walk through two system calls in some detail. ò Open and read. ò Too much code to cover all FS system calls. ò 3 Cases for a dentry:

VFS, Continued. Don Porter CSE 506

Lecture 11: Linux ext3 crash recovery

Chapter 6. File Systems

W4118 Operating Systems. Instructor: Junfeng Yang

GPUfs: Integrating a file system with GPUs

Advanced file systems: LFS and Soft Updates. Ken Birman (based on slides by Ben Atkin)

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Distributed Systems. Lec 9: Distributed File Systems NFS, AFS. Slide acks: Dave Andersen

File Systems: Consistency Issues

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Foundations of the C++ Concurrency Memory Model

Block Device Scheduling. Don Porter CSE 506

Block Device Scheduling

Network File System (NFS)

Designing a True Direct-Access File System with DevFS

GPUfs: Integrating a file system with GPUs

CS370 Operating Systems

STING: Finding Name Resolution Vulnerabilities in Programs

Overview. This Lecture. Interrupts and exceptions Source: ULK ch 4, ELDD ch1, ch2 & ch4. COSC440 Lecture 3: Interrupts 1

Chapter 7: File-System

3/26/2014. Contents. Concepts (1) Disk: Device that stores information (files) Many files x many users: OS management

Fall 2012 Parallel Computer Architecture Lecture 16: Speculation II. Prof. Onur Mutlu Carnegie Mellon University 10/12/2012

CrashMonkey: A Framework to Systematically Test File-System Crash Consistency. Ashlie Martinez Vijay Chidambaram University of Texas at Austin

416 Distributed Systems. Distributed File Systems 1: NFS Sep 18, 2018

CSE506: Operating Systems CSE 506: Operating Systems

Network File Systems

FS Consistency & Journaling

Toward An Integrated Cluster File System

ABORTING CONFLICTING TRANSACTIONS IN AN STM

High Performance Computing on GPUs using NVIDIA CUDA

Chris Rossbach, Owen Hofmann, Don Porter, Hany Ramadan, Aditya Bhandari, Emmett Witchel University of Texas at Austin

CS 471 Operating Systems. Yue Cheng. George Mason University Fall 2017

Lecture 18: Reliable Storage

File Systems. CS170 Fall 2018

F 4. Both the directory structure and the files reside on disk Backups of these two structures are kept on tapes

Potential violations of Serializability: Example 1

TDDB68 Concurrent Programming and Operating Systems. Lecture: File systems

Chí Cao Minh 28 May 2008

Operating Systems Design Exam 2 Review: Spring 2011

CS 416: Opera-ng Systems Design March 23, 2012

GFS: The Google File System

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency

Concurrent & Distributed Systems Supervision Exercises

Chris Rossbach, Jon Currey, Microsoft Research Mark Silberstein, Technion Baishakhi Ray, Emmett Witchel, UT Austin SOSP October 25, 2011

Da-Wei Chang CSIE.NCKU. Professor Hao-Ren Ke, National Chiao Tung University Professor Hsung-Pin Chang, National Chung Hsing University

CSE 451: Operating Systems. Section 10 Project 3 wrap-up, final exam review

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission 1

Enabling Transactional File Access via Lightweight Kernel Extensions

Reminder from last time

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Block Device Scheduling. Don Porter CSE 506

ViewBox. Integrating Local File Systems with Cloud Storage Services. Yupu Zhang +, Chris Dragga + *, Andrea Arpaci-Dusseau +, Remzi Arpaci-Dusseau +

Ext3/4 file systems. Don Porter CSE 506

Introduction Tasks Memory VFS IPC UI. Escape. Nils Asmussen MKC, 07/09/ / 40

Crash Consistency: FSCK and Journaling. Dongkun Shin, SKKU

MultiLanes: Providing Virtualized Storage for OS-level Virtualization on Many Cores

Soft Updates Made Simple and Fast on Non-volatile Memory

CSE325 Principles of Operating Systems. File Systems. David P. Duggan. March 21, 2013

Virtualization, Xen and Denali

Dynamic Detection and Prevention of Race Conditions in File Accesses

Transcription:

OPERATING SYSTEM TRANSACTIONS Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, Alexander Benn, and Emmett Witchel The University of Texas at Austin

OS APIs don t handle concurrency 2 OS is weak link in concurrent programming model Can t make consistent updates to system resources across multiple system calls Race conditions for resources such as the file system No simple work-around Applications can t express consistency requirements OS can t infer requirements

System transactions 3 System transactions ensure consistent updates by concurrent applications Prototype called TxOS Solve problems System level race conditions (TOCTTOU) Build better applications LDAP directory server Software installation

System-level races 4 (root) if(access( foo )) { foo == /etc/passwd } fd = open( foo ); write(fd, ); Time-of-check-to-time-of-use (TOCTTOU) race condition

TOCTTOU race eliminated 5 (root) sys_xbegin(); if(access( foo )) { fd = open( foo ); write(fd, ); } sys_xend();

Example 1: better application design 6 How to make consistent updates to stable storage? Application Enterprise data storage User directory service (LDAP) Editor Technique Database Sys???? Tx rename() Complex Simple

Ex 2: transactional software install 7 sys_xbegin(); apt-get upgrade sys_xend(); A failed install is automatically rolled back Concurrent, unrelated operations are unaffected System crash: reboot to entire upgrade or none

System transactions 8 Simple API: sys_xbegin, sys_xend, sys_xabort Transaction wraps group of system calls Results isolated from other threads until commit Transactions execute concurrently for performance Conflicting transactions must serialize for safety Conflict most often read & write of same datum Too much serialization hurts performance

Related work 9 Developers changing syscall API for concurrency Ad hoc, partial solutions: openat(), etc. System transactions have been proposed and built QuickSilver [SOSP 91], LOCUS [SOSP 85] Key contribution: new design and implementation Uphold strong guarantees and good performance System transactions!= transactional memory TxOS runs on commodity hardware

Outline 10 Example uses of system transactions TxOS design and implementation Evaluation

Building a transactional system 11 Version management Private copies instead of undo log Detect conflicts Minimize performance impact of true conflicts Eliminate false conflicts Resolve conflicts Non-transactional code must respect transactional code

12 TxOS in action CPU 0 (low priority) sys_xbegin(); chmod( f, 0x755); sys_xend(); Contention Mgr. Inode f Header Conflicting Annotation Abort CPU 0 (lower prio) CPU 1 (high priority) sys_xbegin(); chown( f, 1001); sys_xend(); 0x755 1000 Private Copies 0x700 1000 Inode f Data 0x700 1001 Private Inode Copies f Data

System comparison 13 Speculative write location Isolation mechanism Rollback mechanism Commit mechanism Previous Systems TxOS Shared data Private copies of structures Deadlock data structures prone Two-phase Private copies + locking annotations Can cause priority Undo log Discard private inversion copies Discard undo log, Publish private release locks copy by ptr swap

Minimizing false conflicts 14 R R Add/Del W l OK Add/Del+R if different files created, Dir not read Add/Del W Add/Del Add/Del+R Insight: object semantics allow more permissive conflict definition and therefore more concurrency TxOS supports precise conflict definitions per object sys_xbegin(); create( /tmp/foo ); type sys_xend(); sys_xbegin(); create( /tmp/bar ); sys_xend(); Increases concurrency without relaxing isolation

15 Serializing transactions and nontransactions (strong isolation) TxOS mixes transactional and non-tx code In database, everything is transaction Semantically murky in historical systems Critical to correctness Allows incremental adoption of transactions TOCTTOU attacker will not use a transaction Problem: can t roll back non-transactional syscall Always aborting transaction undermines fairness

Strong isolation in TxOS 16 CPU 0 symlink( /etc/passwd, /tmp/foo ); Conflicting Annotation Dentry /tmp/foo Header CPU 1 sys_xbegin(); if(access( /tmp/foo )) open( /tmp/foo ); sys_xend(); Contention Manager Options: Dentry /tmp/foo Data Abort CPU1 Deschedule CPU0

Transactions for application state 17 System transactions only manage system state Applications can select their approach Copy-on-write paging Hardware or Software Transactional Memory (TM) Application-specific compensation code

Transactions: a core OS abstraction 18 Easy to make kernel subsystems transactional Transactional filesystems in TxOS Transactions implemented in VFS or higher FS responsible for atomic updates to stable store Journal + TxOS = Transactional Filesystem 1 developer-month transactional ext3 prototype

Evaluation 19 Example uses of system transactions TxOS design and implementation Evaluation What is the cost of using transactions? What overheads are imposed on non-transactional applications?

TxOS Prototype 20 Extend Linux 2.6.22 to support system transactions Add 8,600 LOC to Linux Minor modifications to 14,000 LOC Runs on commodity hardware Transactional semantics for a range of resources: File system, signals, processes, pipes

Hardware and benchmarks 21 Quadcore 2.66 GHz Intel Core 2 CPU, 4 GB RAM Benchmark Description install install of svn 1.4.4 make Compile nano 2.06 inside a tx dpkg dpkg install OpenSSH 4.6 LFS large/small RAB Wrap each phase in a tx Reimplemeted Andrew Benchmark Each phase in a tx

Transactional software install 22 sys_xbegin(); dpkg i openssh; sys_xend(); sys_xbegin(); install svn; sys_xend(); 10% overhead 70% overhead A failed install is automatically rolled back Concurrent, unrelated operations are unaffected System crash: reboot to entire upgrade or none

Transaction overheads 23 install dpkg make LFS Small Delete LFS Small Read LFS Large Read Rnd Execution Time Normalized to Linux Memory overheads on LFS large: 13% high, 5% low (kernel) 0 0.5 1 1.5 2 2.5 3

Write speedups 24 Speedup over Linux RAB cp RAB mkdir LFS L Write Rand LFS L Write Seq LFS S Create 0 2 4 6 8 10 12 14 16 18 20 Better I/O scheduling not luck Tx boundaries provide I/O scheduling hint to OS

rename() Lightweight DB alternative Sys Tx Databases 25 OpenLDAP directory server Replace BDB backend with transactions + flat files 2-4.2x speedup on write-intensive workloads Comparable performance on read-only workloads Primarily serviced from memory cache

Non-transactional overheads 26 Non-transactional Linux compile: <2% on TxOS Transactions are pay-to-play Single system call: 42% geometric mean With additional optimizations: 14% geomean Optimizations approximated by eliding checks

What is practical? 27 1.2 1.15 1.1 1.05 1 Mean Linux Syscall Overhead, Normalized to 2.6.22 22 08/07 23 24 25 26 27 28 29 30 31 09/09 Feature creep over 2 years costs 16% Developers are willing to give up performance for useful features Transactions are in same range (14%), more powerful

OSes should support transactions 28 Practical implementation techniques for modern OS Transactions solve long-standing problems Replace ad hoc solutions Transactions enable better concurrent programs http://www.cs.utexas.edu/~porterde/txos porterde@cs.utexas.edu

29 Backup Slides

Windows kernel transaction manager 30 Framework for 2-Phase Commit Coordinate transactional file system, registry Transactional FS and registry Completely different implementation FS updates in place, Registry uses private copies Little opportunity for code reuse across subsystems Explicitly transacted code More conservative, limited design choice TxOS allows implicit transactions, application wrappers

Distributed transactions 31 User/language-level transactions Cannot isolate OS managed resources TABS [SOSP 85], Argus [SOSP 87], Sinfonia [SOSP 07] TABS transactional windows manager Grayed out aborted dialog Argus similar strategies for limiting false conflicts

Transactional file systems 32 Good idea, difficult to implement Challenging to implement below VFS layer Valor [FAST 09] introduces OS support in page cache Lack simple abstractions Users must understand implementation details Deadlock detection (Transactional NTFS) Logging and locking mechanism (Valor) Lack support for other OS resources in transactions Windows KTM supports transactional registry

Speculator 33 Goal: hide latency of operations NFS client requests, synchronous writes, etc. Similar implementation at points Different goals, not sufficient to provide transactional semantics Isolation vs. dependences

xcalls [EuroSys 09] 34 User-level techniques for transactional system calls Within a single application only Works for many common cases (buffering writes) Edge cases difficult without system support E.g., close() or munmap() can implicitly delete a file