Commit-On-Sharing High Level Design

Similar documents
DNE2 High Level Design

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Transactions. ACID Properties of Transactions. Atomicity - all or nothing property - Fully performed or not at all

Transactions. Transaction. Execution of a user program in a DBMS.

TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION TRANSACTION PROCESSING PROPERTIES OF A TRANSACTION 4/3/2014

Remote Directories High Level Design

Lecture 21: Transactional Memory. Topics: Hardware TM basics, different implementations

DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI

Lecture 18: Reliable Storage

Lecture 20 Transactions

CSE 486/586: Distributed Systems

Database Tuning and Physical Design: Execution of Transactions

TRANSACTION PROPERTIES

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

A transaction is a sequence of one or more processing steps. It refers to database objects such as tables, views, joins and so forth.

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Transactions. Lecture 8. Transactions. ACID Properties. Transaction Concept. Example of Fund Transfer. Example of Fund Transfer (Cont.

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Concurrency Control in Distributed Systems. ECE 677 University of Arizona

Lecture X: Transactions

PROJECT 6: PINTOS FILE SYSTEM. CS124 Operating Systems Winter , Lecture 25

Bull. HACMP 4.4 Programming Locking Applications AIX ORDER REFERENCE 86 A2 59KX 02

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Chapter 15: Transactions

! Design constraints. " Component failures are the norm. " Files are huge by traditional standards. ! POSIX-like

Lecture: Consistency Models, TM. Topics: consistency models, TM intro (Section 5.6)

Fault Tolerance. Goals: transparent: mask (i.e., completely recover from) all failures, or predictable: exhibit a well defined failure behavior

Global atomicity. Such distributed atomicity is called global atomicity A protocol designed to enforce global atomicity is called commit protocol

Goal A Distributed Transaction

NFS: Naming indirection, abstraction. Abstraction, abstraction, abstraction! Network File Systems: Naming, cache control, consistency


BİL 354 Veritabanı Sistemleri. Transaction (Hareket)

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

DB2 Lecture 10 Concurrency Control

Lecture 4: Directory Protocols and TM. Topics: corner cases in directory protocols, lazy TM

Fault Tolerance. o Basic Concepts o Process Resilience o Reliable Client-Server Communication o Reliable Group Communication. o Distributed Commit

Distributed Databases

Transaction Management: Introduction (Chap. 16)

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

European Lustre Workshop Paris, France September Hands on Lustre 2.x. Johann Lombardi. Principal Engineer Whamcloud, Inc Whamcloud, Inc.

Chapter 11: File System Implementation. Objectives

Advanced Databases (SE487) Prince Sultan University College of Computer and Information Sciences. Dr. Anis Koubaa. Spring 2014

DATABASE DESIGN I - 1DL300

Chapter 13: Transactions

Batches and Commands. Overview CHAPTER

Fault Tolerance Causes of failure: process failure machine failure network failure Goals: transparent: mask (i.e., completely recover from) all

CS122 Lecture 15 Winter Term,

Intro to Transactions

Transactions. Prepared By: Neeraj Mangla

Recoverability. Kathleen Durant PhD CS3200

Chapter 15: Transactions

Overview of Transaction Management

Transaction Concept. Two main issues to deal with:

Lecture: Consistency Models, TM

Advanced Memory Management

Network File Systems

The Google File System

Optimistic Concurrency Control. April 18, 2018

Chapter 9: Transactions

Imperative Recovery. Jinshan Xiong /jinsan shiung/ 2011 Whamcloud, Inc.

CSE 451 Midterm 1. Name:

Lecture 21: Transactional Memory. Topics: consistency model recap, introduction to transactional memory

Chapter 22. Transaction Management

Lecture: Transactional Memory. Topics: TM implementations

10. Business Process Management

Topics. File Buffer Cache for Performance. What to Cache? COS 318: Operating Systems. File Performance and Reliability

Database System Concepts

UNIT 4 TRANSACTIONS. Objective

Transactions These slides are a modified version of the slides of the book Database System Concepts (Chapter 15), 5th Ed

Concurrency control CS 417. Distributed Systems CS 417

Distributed System. Gang Wu. Spring,2018

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

Database Management Systems 2010/11

Google File System. Arun Sundaram Operating Systems

Lecture 6: Lazy Transactional Memory. Topics: TM semantics and implementation details of lazy TM

Transactions. Chapter 15. New Chapter. CS 2550 / Spring 2006 Principles of Database Systems. Roadmap. Concept of Transaction.

ICOM 5016 Database Systems. Chapter 15: Transactions. Transaction Concept. Chapter 15: Transactions. Transactions

Introduction. Storage Failure Recovery Logging Undo Logging Redo Logging ARIES

The Google File System

CS 167 Final Exam Solutions

Chapter 14: Transactions

Distributed Databases

CHAPTER: TRANSACTIONS

Database Management Systems

Page 1. Goals of Today s Lecture. The ACID properties of Transactions. Transactions

Ignite Key-Value Transactions Architecture

10. Replication. CSEP 545 Transaction Processing Philip A. Bernstein Sameh Elnikety. Copyright 2012 Philip A. Bernstein

Page 1. Goals of Today s Lecture" Two Key Questions" Goals of Transaction Scheduling"

Consistency in Distributed Systems

Lecture 7: Transactional Memory Intro. Topics: introduction to transactional memory, lazy implementation

Roadmap of This Lecture

Main Points. File System Reliability (part 2) Last Time: File System Reliability. Reliability Approach #1: Careful Ordering 11/26/12

CS370: System Architecture & Software [Fall 2014] Dept. Of Computer Science, Colorado State University

Maintaining the NDS Database

JOURNALING FILE SYSTEMS. CS124 Operating Systems Winter , Lecture 26

Recovery System These slides are a modified version of the slides of the book Database System Concepts (Chapter 17), 5th Ed McGraw-Hill by

Network Protocols. Sarah Diesburg Operating Systems CS 3430

Advanced Databases. Transactions. Nikolaus Augsten. FB Computerwissenschaften Universität Salzburg

SoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory

Transcription:

Commit-On-Sharing High Level Design Mike Pershin, Alexander Zam Zarochentsev 8 August 2008 1 Introduction 1.1 Definitions VBR version-based recovery transaction an atomic operation which reads and possibly writes lustre filesystem state including file and directory attributes, directory entries and file contents uncommitted data filesystem state updated by one or more transactions but not yet committed to stable storage inter-client dependent transactions (also dependent transactions) transaction (B) depends on transaction (A) if: 1. (B) reads from an object after (A) has written to it. 2. (A) and (B) are issued by different clients recovery connection re-establishment by a client after a server failure. Recovery is successful if all of the client s uncommitted transactions can be re-executed correctly and its cache can be revalidated. If recovery fails the client is evicted - i.e. it erases all cached state and completes all uncompleted transactions with failure. recovery dependence after a server failure, client (a) is dependent on client (b) for recovery if clients (a) and client (b) have such uncommitted transactions (Ta) and (Tb) respectively that (Ta) depends on (Tb). Thereby client (a) will be evicted unless client (b) participates successfully in recovery. COS Commit on Share - a strategy to eliminate recovery dependence by ensuring that inter-client dependent transaction do not read uncommitted data. 1.2 Background Consider a situation when an MDS server has crashed or re-started. The clients are reconnecting and re-applying their requests to restore connection states. One or more clients missing the recovery may cause other clients to abort their transactions or even to be evicted. The transactions of the missing clients cannot be applied. Moreover a transaction that depends on a missing one cannot be applied correctly and get aborted, the transactions depending on the aborted ones get aborted too and so on. The COS feature attacks the source of these problems by eliminating dependent transactions. If there are no dependent uncommitted transactions to re-apply, the clients apply their requests independently without a menace of being evicted. 1

3 FUNCTIONAL SPECIFICATION 2 Requirements 1. Allow clients to recovery independently 2. The mechanism should be optional to allow users to choose between performance and reliability 3. No changes in wire protocol are allowed 4. Provide compatibility for old clients 3 Functional Specification The proposed solution is to detect inter-client dependent transactions and prevent reading of uncommitted data by doing a commit between the dependent transactions. The solution is built around the LDLM. The following are the highlights of the solution: A client tracking info added to the lock object. We extend the ldlm lock policy field by a tag which identifies the client started the transaction. The point is that ldlm has no notion of client behind request unless lock is explicitly requested by client. After a data modification transaction completes, the corresponding PW lock isn t released immediately. Instead the lock is converted and preserved until transaction commit. The PW lock gets converted into special COS lock. The lock compatible check uses the client tracking information mentioned above to allow any PW/PR lock from the same client and to conflict with any PR/PW lock from any other clients. An early lock conflict (before the lock is converted to a COS lock) is an exception to this lock conversion rule, it is explained in more detail in the Logic section. Another client access is detected as a lock conflict of the client lock request and the COS lock. Once lock conflict happens, we forcing the transaction to commit. A transaction commit releases the COS locks associated with the transaction. 3.1 Relation with REP-ACK REP-ACK and COS are similar. Both mechanisms achieve some levels of guarantee of recoverability of the request before unlocking the object and letting further transactions to proceed. REP-ACK waits a message from the client that request reply was received or transaction commit, COS only waits when the transaction reaches the persistent storage. REP-ACK relies on the client or a disk storage while COS trusts disk storage only. When dependent transactions are detected we use COS instead of REP-ACK. 3.2 Concurrent directory access Concurrent directory access will be supported with pdirops feature which protects hash-associated part of directory with dedicated ldlm resource. COS will apply to hash-associated locks. 3.3 Managing COS New MDS configuration parameter commit_on_sharing added to enable or disable COS, zero value means cos feature is disabled, any non-zero value means the opposite. It makes the COS feature to be controlled through lprocfs or lctl interfaces and cos enable status can be saved as a part of cluster configuration. 2

4 USE CASES 4 Use Cases 4.1 A client modifies object with no changes in cache The object is unlocked and has no uncommitted changes Client issues an object modification request The server takes a PW lock on the object The object gets modified on the server The PW lock gets converted into a COS lock The object remains COS-locked 4.2 The client continues with the write access to the object The changes made to the object by the client are not yet committed to disk. The client issues an object modification request The server requests a PW lock on the object The lock is granted because the COS lock is compatible with any lock request coming from the same client. The object gets modified The client lock is released, the COS lock remains 4.3 The client continues with the read access to the object The changes made to the object by this client are not yet committed to disk. The client issues a request to fetch data from an object The server requests a PR lock on the object The lock is granted because the COS lock is compatible with any lock request coming from the same client The object gets accessed The client lock is released, the COS lock remains 4.4 A client writes to the object recently modified by another client The object was write locked and modified by another client, the changes are not yet committed to the disk. A client issues an object modification request The server requests a PW lock on the object LDLM finds the lock request conflicting with the COS lock LDLM calls the BAST registered with the lock The BAST triggers a transaction commit The COS lock is released by a transaction commit hook The lock request is granted The object gets modified 3

4.5 A client reads the object recently modified by another client 5 LOGIC SPECIFICATION 4.5 A client reads the object recently modified by another client The object was write locked and modified by another client, the changes are not yet committed to the disk. A client issues a request to fetch data from an object LDLM finds the lock request conflicting with the COS lock LDLM calls the BAST registered with the lock The BAST triggers a transaction commit The COS lock is released by a transaction commit hook The lock request is granted The client fetches the object data 4.6 Parallel file creation in one directory A bunch of clients create files in one directory, file name hash collisions considered as rare Clients take locks on hash-associated lock resources of the directory The locks get converted to COS locks Subsequent file creations are compatible with the COS locks and cause no transaction commits 4.7 Enable COS A user issues lctl config_param command lctl set_param mdt.*.commit_on_sharing=1 lctl communicates with the MDS server using procfs interface local MDS servers change their cos status 4.8 Disable COS A user issues lctl config_param command lctl set_param mdt.*.commit_on_sharing=0 lctl communicates with the local MDS servers using procfs interface local MDS servers change their cos status 5 Logic Specification 5.1 Storing client id within ldlm_lock object The ldlm_policy_data_t structure (part of ldlm_lock object) is extended to store connection cookie. The field is initialized at lock request creation and is used to check whether a COS lock and a lock request are compatible (see below). 4

5.2 COS lock 5 LOGIC SPECIFICATION 5.2 COS lock COS lock objects have a new lock mode, LCK_COS. COS locks live only on resource s granted queue, as result of PW locks conversion. COS locks cannot be requested. COS locks cannot be on resource s waiting queue. The ldlm_inodebits_compat_queue() function is changed to compare request locks and COS locks. COS lock compatibility table: Lock request PR/same client PW/same client PR/another client PW/another client COS lock compatible compatible no no 5.3 Triggering transaction commit We define a BAST method for any PW lock taken by server. However, the BAST can be called before the PW lock is converted to COS lock. The transaction might not be started and there is nothing to commit yet. The BAST should perform a commit only if the lock has been converted to COS lock already. If the BAST is called before the lock gets converted, the lock conflict doesn t necessary mean that we get transaction dependency, PW/PR locks from one client may generate a lock conflict as well. We have to do additional check over the lock and the lock request to find whether there is a transaction dependency or not. The check is like the COS compatibility check defined above, but we have no COS lock yet: Lock request PR/same client PW/same client PR/another client PW/another client existing PW lock independent independent dependent dependent If it is found that transactions are dependent, we want the data modification transaction to be committed synchronously. We introduce new force commit bit in the lock flags. The bit is set when transaction dependency is detected but the lock isn t yet converted to a COS one. The bit keeps the information about a need of commit until the data modification is completed and the transaction can be committed. The key modifications to the data modification routine and the blocking AST are illustrated by the following flowchart diagrams: The commits are are done asynchronously, by another thread. We only ask it to commit the transactions. After the thread commits the transactions it calls commit hook where saved locks get released. However commit start could be issued more then once for one transaction. There is number of reasons for that: many COS locks for one resource and thereby many BAST messages and many lock resources involved into the transaction. Some of the sync starts can come late when the transaction has been committed already. It may have a negative effect of forcing the next transaction to commit. The negative effect isn t measured yet and considered as low one for now. 5.4 Server reply and client ACK If we don t use ACK for releasing the locks and the request, we don t set the LNET_ACK_REQ flag with the server reply. 5.5 Saving COS lock until commit REP-ACK already has an infrastructure (ptlrpc_save_lock) to save and keep locks until client ACK or transaction commit. We reuse that whole infrastructure for COS. The COS lock will be kept until transaction commit because we don t require client ACK to be sent on the server reply. 5

8 FOCUS OF INSPECTION 6 State Management 7 Alternatives 7.1 Lazy commit We don t force commit immediately after inter-client dependency is found, but have coming client to wait a commit triggered by timeout or other reasons. 8 Focus of Inspection 6