Weak Consistency and Disconnected Operation in git. Raymond Cheng

Similar documents
Software Tools Subversion

Version Control Systems

Version Control Systems (Part 1)

Version Control Systems: Overview

Revision Control. An Introduction Using Git 1/15

EECS 482 Introduction to Operating Systems

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Revision Control. Software Engineering SS 2007

[Software Development] Development Tools. Davide Balzarotti. Eurecom Sophia Antipolis, France

CSCI 2132: Software Development. Norbert Zeh. Faculty of Computer Science Dalhousie University. Subversion (and Git) Winter 2019

Topics covered. Introduction to Git Git workflows Git key concepts Hands on session Branching models. Git 2

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Laboratorio di Programmazione. Prof. Marco Bertini

CS 390 Software Engineering Lecture 5 More Git

Computer Science Design I Version Control with Git

Revision Control II. - svn

Configuration Management

CSC 2700: Scientific Computing

Revision control systems (RCS) and. Subversion

CPSC 491. Lecture 19 & 20: Source Code Version Control. VCS = Version Control Software SCM = Source Code Management

GIT TUTORIAL. Creative Software Architectures for Collaborative Projects CS 130 Donald J. Patterson

Introduction to Version Control


Decentralized Version Control Systems

Git for Subversion users

M E R C U R I A L (The Source Control Management)

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

Version Control. Kyungbaek Kim. Chonnam National University School of Electronics and Computer Engineering. Original slides from James Brucker

Revision Control. How can 4. Slides #4 CMPT 276 Dr. B. Fraser. Local Topology Simplified. Git Basics. Revision Control:

Git tutorial. Katie Osterried C2SM. October 22, 2015

Computing Parable. The Archery Teacher. Courtesy: S. Keshav, U. Waterloo. Computer Science. Lecture 16, page 1

Con$nuous Integra$on Development Environment. Kovács Gábor

A BASIC UNDERSTANDING OF VERSION CONTROL

Versioning with git. Moritz August Git/Bash/Python-Course for MPE. Moritz August Versioning with Git

Data-Centric Consistency Models. The general organization of a logical data store, physically distributed and replicated across multiple processes.

2/8/18. Overview. Project Management. The First Law. What is Project Management? What Are These Changes? Software Configuration Management (SCM)

Introduction to Git. Database Systems DataLab, CS, NTHU Spring, 2018

Project Management. Overview

February 2 nd Jean Parpaillon

Subversion Repository Layout

Working with CVS in Eclipse

Ingegneria del Software Corso di Laurea in Informatica per il Management (D)VCS. Davide Rossi Dipartimento di Informatica Università di Bologna

Consistency and Replication. Why replicate?

Version Control. Version Control

Version Control with GIT: an introduction

Version Control Systems (VCS)

Version Control. Version Control

Consistency and Replication 1/65

Version Control Systems

CAP Theorem, BASE & DynamoDB

Example File Systems Using Replication CS 188 Distributed Systems February 10, 2015

What is Subversion and what does it do?

2/9/2013 LAB OUTLINE INTRODUCTION TO VCS WHY VERSION CONTROL SYSTEM(VCS)? II SENG 371 SOFTWARE EVOLUTION VERSION CONTROL SYSTEMS

Visualizing Git Workflows. A visual guide to 539 workflows

BRANCH:IT FINAL YEAR SEVENTH SEM SUBJECT: MOBILE COMPUTING UNIT-IV: MOBILE DATA MANAGEMENT

Distributed Systems 16. Distributed File Systems II

CS Amazon Dynamo

Push up your code next generation version control with (E)Git

Changing Requirements for Distributed File Systems in Cloud Storage

Replication and Consistency

A Practical Introduction to Version Control Systems

Availability versus consistency. Eventual Consistency: Bayou. Eventual consistency. Bayou: A Weakly Connected Replicated Storage System

Version Control Systems. Copyright 2017 by Robert M. Dondero, Ph.D. Princeton University

CS314 Software Engineering Configuration Management

Revision control. INF5750/ Lecture 2 (Part I)

Introduction to Git and Github

Distributed Data Management Transactions

GFS: The Google File System. Dr. Yingwu Zhu

Consistency examples. COS 418: Distributed Systems Precept 5. Themis Melissaris

Git. Christoph Matthies Software Engineering II WS 2018/19. Enterprise Platform and Integration Concepts group

CLOUD-SCALE FILE SYSTEMS

CS 390 Software Engineering Lecture 3 Configuration Management

Fundamentals of Git 1

Important Lessons. A Distributed Algorithm (2) Today's Lecture - Replication

Welcome! Virtual tutorial will start at 15:00 GMT. Please leave feedback afterwards at:

Consistency and Replication 1/62

VSO. Configuration Management

Version Control. CSC207 Fall 2014

CSE 544 Principles of Database Management Systems. Alvin Cheung Fall 2015 Lecture 14 Distributed Transactions

Software Engineering 2 A practical course in software engineering. Ekkart Kindler

Lab 01 How to Survive & Introduction to Git. Web Programming DataLab, CS, NTHU

Tools for software development:

Versioning. Terms. Configuration item (CI) Version Configuration Management aggregate Configuration Baseline

Prof. Dr. Marko Boger. Prof. Dr. Christian Johner. Version Management

GIT. CS 490MT/5555, Spring 2017, Yongjie Zheng

The Rock branching strategy is based on the Git Branching Model documented by Vincent Driessen.

Distributed Version Control

Introduction to Distributed Data Systems

CA485 Ray Walshe Google File System

EGit in Eclipse. Distributed Verzion Control Systems

Source Control. Comp-206 : Introduction to Software Systems Lecture 21. Alexandre Denault Computer Science McGill University Fall 2006

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Git better. Collaborative project management using Git and GitHub. Matteo Sostero March 13, Sant Anna School of Advanced Studies

GFS: The Google File System

CSE 160: Introduction to Parallel Computation

Source Code Management

The Old World. Have you ever had to collaborate on a project by

G Bayou: A Weakly Connected Replicated Storage System. Robert Grimm New York University

CSE 5306 Distributed Systems. Consistency and Replication

Revision control Advanced git

Transcription:

Weak Consistency and Disconnected Operation in git Raymond Cheng ryscheng@cs.washington.edu

Motivation How can we support disconnected or weakly connected operation? Applications File synchronization across users / devices (e.g. Dropbox) Source code control (e.g. git) Disconnected / intermittent connectivity (e.g. laptops, mobile, 3rd world)

Consistency Recap Sequential consistency: everyone sees same read/write order (cache coherence, Paxos) Release consistency: everyone sees writes in unlock order (x86/arm) (more generally, reads/writes forced to complete at memory barriers) Sequential/release consistency is slow: wait at memory barrier communication needed if recently modified by another node or if write and the local copy is not exclusive

Source Code Control Eventual Consistency Okay to read/write cached copy (different versions) Check if it s okay later, recover if necessary Track history (with metadata) Concurrent editing / Many contributors Working copy: Don t want files to change beneath you Push / Pull to server/peers Contributors may be offline / disconnected Cheap Branches / Merging Centralized / Distributed

CVS (1990) Client-server architecture Check out a working copy Check in your changes Server arbitrates order of changes Only accept changes to the most recent version of a file Developers must always keep their files up to date

CVS Workflow Server revision: 1.2 revision: 1.10 checkout Client 1 revision: 1.2 revision: 1.10 Client 1 revision: 1.2 revision: 1.10

CVS Workflow Server revision: 1.3 revision: 1.10 commit file1 r1.3 Client 1 revision: 1.3 revision: 1.10 Edit file1 Client 1 revision: 1.2 revision: 1.10

CVS Workflow Server revision: 1.3 revision: 1.10 Commit file1 file2 FAILS Client 1 revision: 1.3 revision: 1.10 Client 1 revision: 1.3 revision: 1.11 Edit file1, file2

CVS Workflow Server revision: 1.4 revision: 1.11 update file1 r1.3 commit file1 file2 SUCCEEDS Client 1 revision: 1.3 revision: 1.10 Client 1 revision: 1.3 => 1.4 revision: 1.11 Fix file1 conflicts

CVS Limitations Everyone is editing the same repository How do you implement a complex feature without constantly conflicting? No local version control No log cvs commit ~ git commit && git push How do I time travel? No versioning of moving / renaming files Depends on live server to operate Needs to be scaled / backed up / always up / reachable Branches were expensive Manual locks are common No atomicity (e.g. network failure lead to inconsistency)

Apache SVN (2000) CVS done right Improvements Atomic commits Renamed / moved / copied files retained version history Versioning of directories and metadata Cheap branches / tagging Centralized - server/client architecture Still active All of Facebook s source code was in a single SVN repository until 2014

Commit Log A:0 A:1 A:2 A:3 A:4 A:5

Branching A:0 A:1 A:2 A:3 A:4 A:5 B:1 B:1 B:1

Merging A:0 A:1 A:2 A:3 A:4 A:5 A:5 Ancestry Set {A: 0-4, B:1-3} B:1 B:2 B:3

Merging and Causal Ordering Operations that potentially are causally related are seen by every node of the system in the same order Example: C1: f=1 -> C2 C2: f=2 -> C3 C3: f=3 -> C1 Example: C1: a=1 -> C2 C2: b=2 -> C3 C3: c=3 -> C1

Merging A:0 A:1 A:2 A:3 A:4 A:5 A:6 A:5 Ancestry Set {A: 0-5, B:1-4} B:1 B:2 B:3 B:4

git (2005) Distributed! Everyone is a replica Consistency and performance Protects from memory, disk corruption Cheap branches / merges.git/ Config Content-addressable filesystem Log of changes (commit history)

Logs (Commit Histories) Complete log of changes (needed for time travel with source code control) Directed acyclic graphs (DAG) commit parents deltas (changes to content) hash - for consistency metadata

Content Addressable Filesystem.git/objects

Reconciling Conflicts Last writer wins (AFS, NFS) User-specified conflict handler Manual reconciliation (git, svn) Conflict-free replicated data types (CRDT) Data types such that conflicts are impossible Operation-based CRDT (e.g. commutative replicated data types)

Operational Transforms