What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read)

Similar documents
Git Branching. Chapter What a Branch Is

CPSC 491. Lecture 19 & 20: Source Code Version Control. VCS = Version Control Software SCM = Source Code Management

GUIDE TO MAKE A REAL CONTRIBUTION TO AN OPEN SOURCE PROJECT 1. 1

Version Control. Second level Third level Fourth level Fifth level. - Software Development Project. January 17, 2018

Using git to download and update BOUT++

Using Git to Manage Source RTL

Laboratorio di Programmazione. Prof. Marco Bertini

Version Control: Gitting Started

Software Development I

Versioning with git. Moritz August Git/Bash/Python-Course for MPE. Moritz August Versioning with Git

Git, the magical version control

Software Project (Lecture 4): Git & Github

Using GitHub to Share with SparkFun a

Introduction, Instructions and Conventions

A L A TEX-oriented intro to Git

A quick (and maybe practical) guide to Git and version control. By Jay Johnson

git commit --amend git rebase <base> git reflog git checkout -b Create and check out a new branch named <branch>. Drop the -b

Version Control with GIT: an introduction

Version Control for Fun and Profit

CS 520: VCS and Git. Intermediate Topics Ben Kushigian

Version Control with GIT

Computer Science Design I Version Control with Git

Version Control System - Git. zswu

How To Use Git. Advanced: Tags & Branches. Mary Kate Trost July 8, 2011

Git Workflows. Sylvain Bouveret, Grégory Mounié, Matthieu Moy

Getting started with GitHub

1. Which of these Git client commands creates a copy of the repository and a working directory in the client s workspace. (Choose one.

Fundamentals of Git 1

Introduction to distributed version control with git

Review Version Control Concepts

Github/Git Primer. Tyler Hague

GETTING STARTED WITH. Michael Lessard Senior Solutions Architect June 2017

Git better. Collaborative project management using Git and GitHub. Matteo Sostero March 13, Sant Anna School of Advanced Studies

KTH Royal Institute of Technology SEMINAR 2-29 March Simone Stefani -

Working in Teams CS 520 Theory and Practice of Software Engineering Fall 2018

How to set up SQL Source Control The short guide for evaluators

Revision control. INF5750/ Lecture 2 (Part I)

Revision Control. How can 4. Slides #4 CMPT 276 Dr. B. Fraser. Local Topology Simplified. Git Basics. Revision Control:

CSC 2700: Scientific Computing

b. Developing multiple versions of a software project in parallel

Introduction to Git and GitHub for Writers Workbook February 23, 2019 Peter Gruenbaum

Git Introduction CS 400. February 11, 2018

Software Development. Using GIT. Pr. Olivier Gruber. Laboratoire d'informatique de Grenoble Université de Grenoble-Alpes

API RI. Application Programming Interface Reference Implementation. Policies and Procedures Discussion

Introduction to Git and Github

Version control CSE 403


Submitting your Work using GIT

Lab Exercise Git: A distributed version control system

Version control CSE 403

Git Workbook. Self-Study Guide to Git. Lorna Mitchell. This book is for sale at

[Software Development] Development Tools. Davide Balzarotti. Eurecom Sophia Antipolis, France

Revision control systems (RCS) and. Subversion

Overview. 1. Install git and create a Github account 2. What is git? 3. How does git work? 4. What is GitHub? 5. Quick example using git and GitHub

2 Initialize a git repository on your machine, add a README file, commit and push

Improving Your Life With Git

About SJTUG. SJTU *nix User Group SJTU Joyful Techie User Group

GIT FOR SYSTEM ADMINS JUSTIN ELLIOTT PENN STATE UNIVERSITY

2/9/2013 LAB OUTLINE INTRODUCTION TO VCS WHY VERSION CONTROL SYSTEM(VCS)? II SENG 371 SOFTWARE EVOLUTION VERSION CONTROL SYSTEMS

CSE 332: Data Structures and Parallelism Winter 2019 Setting Up Your CSE 332 Environment

Git: (Distributed) Version Control

Welcome! Virtual tutorial will start at 15:00 GMT. Please leave feedback afterwards at:

Version Control Systems (Part 1)

Push up your code next generation version control with (E)Git

Lab 08. Command Line and Git

Version Control Systems. Copyright 2017 by Robert M. Dondero, Ph.D. Princeton University

Version Control System GIT

CS 390 Software Engineering Lecture 3 Configuration Management

Version Control. Second level Third level Fourth level Fifth level. - Software Development Project. January 11, 2017

Git. A Distributed Version Control System. Carlos García Campos

Intro to Github. Jessica Young

Version Control. CSC207 Fall 2014

Git! Fundamentals. IT Pro Roundtable! June 17, 2014!! Justin Elliott! ITS / TLT! Classroom and Lab Computing!! Michael Potter!

Bazaar VCS. Concepts and Workflows

And check out a copy of your group's source tree, where N is your one-digit group number and user is your rss username

Git for Version Control

Assumptions. GIT Commands. OS Commands

IC Documentation. Release 0.1. IC team

Using Git For Development. Shantanu Pavgi, UAB IT Research Computing

What is git? Distributed Version Control System (VCS); Created by Linus Torvalds, to help with Linux development;

Git. all meaningful operations can be expressed in terms of the rebase command. -Linus Torvalds, 2015

Distributed Version Control

Version control with git and Rstudio. Remko Duursma

Version Control with Git ME 461 Fall 2018

Topics covered. Introduction to Git Git workflows Git key concepts Hands on session Branching models. Git 2

Pragmatic Guide to Git

Liquibase Version Control For Your Schema. Nathan Voxland April 3,

Git. Charles J. Geyer School of Statistics University of Minnesota. Stat 8054 Lecture Notes

An introduction to git

Creating a Patch. Created by Carl Heymann on 2010 Sep 14 1

Version Control with Git

CSCI 2132: Software Development. Norbert Zeh. Faculty of Computer Science Dalhousie University. Subversion (and Git) Winter 2019

12/7/09. How is a programming language processed? Picasso Design. Collaborating with Subversion Discussion of Preparation Analyses.

Outline The three W s Overview of gits structure Using git Final stuff. Git. A fast distributed revision control system

CESSDA Expert Seminar 13 & 14 September 2016 Prague, Czech Republic

RSARTE Git Integration

Version Control Systems: Overview

Lab 01 How to Survive & Introduction to Git. Web Programming DataLab, CS, NTHU


The Old World. Have you ever had to collaborate on a project by

Transcription:

1

For the remainder of the class today, I want to introduce you to a topic we will spend one or two more classes discussing and that is source code control or version control. What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read) 2

There are several different types of VC. Earliest systems used local version control. Utilities such as diff and patch can be used to implement a form of version control. (tell story about Mom s tar, diff, patch system) rcs is a popular local version control system still in use today. Might be useful in a system with no network. 3

The next iteration of version control was to store different versions on a centralized server system that was connected to each developer. This allowed developers working on different systems to collaborate on the same project. Basically, a single server contained all the versions of each file and the client systems would check files in and out of this central location. There are risks to doing it this way. The most obvious is the single point of failure that the centralized server represents. If that server goes down for an hour, then during that hour nobody can collaborate at all or save versioned changes to anything they re working on. If the hard disk the central database is on becomes corrupted, and proper backups haven t been kept, you lose absolutely everything the entire history of the project except whatever single snapshots people happen to have on their local machines. Local VCS systems suffer from this same problem whenever you have the entire history of the project in a single place, you risk losing everything 4

So, the solution that was proposed and developed was to use a distributed VCS system where every client system keeps a full mirror of the entire repository. Now, obviously the downside is that you have extra data stored on each client. But, every clone of the repo now has a full back-up of the data. So, if any server dies, and these systems were collaborating via the server, any of the client repositories can be copied back up to the server to restore it. 5

We re going to study git in this course. The major difference between Git and any other VCS (Subversion and friends included) is the way Git thinks about its data Conceptually, most other systems store information as a list of file-based changes Draw: V1 V2 V3 V4 V5 FA D1 D2 FB D1 D2 FC D1 D2 D3 These systems (CVS, Subversion, Perforce, Bazaar, and so on) think of the information they keep as a set of files and the changes made to each file over time 6

Git thinks of its data more like a set of snapshots of a miniature filesystem. Every time you commit, or save the state of your project in Git, it basically takes a picture of what all your files look like at that moment and stores a reference to that snapshot To be efficient, if files have not changed, Git doesn t store the file again, just a link to the previous identical file it has already stored. Git thinks about its data more like a stream of snapshots 6

Most operations in Git only need local files and resources to operate generally no information is needed from another computer on your network. For example, to browse the history of the project, Git doesn t need to go out to the server to get the history and display it for you it simply reads it directly from your local database. This means you see the project history almost instantly. If you want to see the changes introduced between the current version of a file and the file a month ago, Git can look up the file a month ago and do a local difference calculation, instead of having to either ask a remote server to do it or pull an older version of the file from the remote server to do it locally. Working offline is nice. Git also has some nice integrity guarantees. Everything in Git is check-summed before it is stored and is then referred to by that checksum. This means it s impossible to change the contents of any file or directory without Git knowing about it. The mechanism that Git uses for this checksumming is called a SHA-1 hash. This is a 40-character string composed of hexadecimal characters (0 9 and a f) and calculated 7

based on the contents of a file or directory structure in Git When you do actions in Git, nearly all of them only add data to the Git database. It is hard to get the system to do anything that is not undoable or to make it erase data in any way. This makes using Git a joy because we know we can experiment without the danger of severely screwing things up. 7

This is the main thing to remember about Git if you want the rest of your learning process to go smoothly. Git has three main states that your files can reside in: committed, modified, and staged. Committed means that the data is safely stored in your local database. Modified means that you have changed the file but have not committed it to your database yet. Staged means that you have marked a modified file in its current version to go into your next commit snapshot 8

There are three main sections of a git project The Git directory is where Git stores the metadata and object database for your project. This is the most important part of Git, and it is what is copied when you clone a repository from another computer. The working directory is a single checkout of one version of the project. These files are pulled out of the compressed database in the Git directory and placed on disk for you to use or modify. The staging area is a file, generally contained in your Git directory, that stores information about what will go into your next commit. It s sometimes referred to as the index, but it s also common to refer to it as the staging area. The basic Git workflow goes something like this: 1. You modify files in your working directory. 2. You stage the files, adding snapshots of them to your staging area. 3. You do a commit, which takes the files as they are in the staging area and stores that snapshot permanently to your Git directory. 9

If a particular version of a file is in the Git directory, it s considered committed. If it has been modified and was added to the staging area, it is staged. And if it was changed since it was checked out but has not been staged, it is modified. 9

You now have a bona fide git repository and a checkout or working copy of the files for that project. You need to make some changes and commit snapshots of those changes into your repository each time the project reaches a state you want to record. Each file in your working directory can be in one of two states: tracked or untracked. Tracked files are files that were in the last snapshot; they can be unmodified, modified, or staged Untracked files are everything else any files in your working directory that were not in your last snapshot and are not in your staging area. When you first clone a repository, all of your files will be tracked and unmodified because you just checked them out and haven t edited anything 10

11

The main tool you use to determine which files are in which state is the git status command. 12

13

14

15

16

17

Now, I want to talk about branching in git. Branching just means you diverge from the main line of development and continue to do work without messing with that main line. It's an important feature of version control systems because it allows you to implement new and experimental features without having your untested code in the mainline source tree. Also, branching in git is sometimes referred to as it's killer feature. The reason is that, in many VCS's, branching requires you to copy the entire source tree. Git's branching mechanism is very lightweight, and encourages workflows that branch and merge often. 18

To understand branching in git, we first need to understand how git actually stores the content you've committed. When you make a commit in git, the system stores a commit object that contains a pointer to the snapshot of the content you staged. This object also contains the author s name and email, the message that you typed, and pointers to the commit or commits that directly came before this commit (its parent or parents): zero parents for the initial commit, one parent for a normal commit, and multiple parents for a commit that results from a merge of two or more branches. 19

To visualize this, let s assume that you have a directory containing three files, and you stage them all and commit. Staging the files checksums each one stores that version of the file in the git repository (git refers to them as blobs), and adds that checksum to the staging area When you create the commit by running git commit, git checksums each subdirectory (in this case, just the root project directory) and stores those tree objects in the git repository. git then creates a commit object that has the metadata and a pointer to the root project tree so it can re-create that snapshot when needed 20

If you make some changes and commit again, the next commit stores a pointer to the commit that came immediately before it. 21

A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in git is master. As you start making commits, you re given a master branch that points to the last commit you made. Every time you commit, it moves forward automatically. 22

What happens if you create a new branch? Well, doing so creates a new pointer for you to move around. Let s say you create a new branch called testing. You do this with the git branch command This creates a new pointer to the same commit you re currently on. 23

How does git know what branch you re currently on? It keeps a special pointer called HEAD. In git, this is a pointer to the local branch you re currently on. In this case, you re still on master. The git branch command only created a new branch it didn t switch to that branch. try git log oneline --decorate 24

To switch to an existing branch, you run the git checkout command. 25

What is the significance of that? Well, let s do another commit This is interesting, because now your testing branch has moved forward, but your master branch still points to the commit you were on when you ran git checkout to switch branches. Let s switch back to the master branch. 26

Let's switch back to master. That command did two things. It moved the HEAD pointer back to point to the master branch, and it reverted the files in your working directory back to the snapshot that master points to. This also means the changes you make from this point forward will diverge from an older version of the project. It essentially rewinds the work you ve done in your testing branch so you can go in a different direction 27

Let s make a few changes and commit again Now your project history has diverged. You created and switched to a branch, did some work on it, and then switched back to your main branch and did other work. Both of those changes are isolated in separate branches: you can switch back and forth between the branches and merge them together when you re ready. And you did all that with simple branch, checkout, and commit commands. Take a break to show these commands. 28

Next, let's talk about how you can use branching and then merge two branches of development together. First, consider an example of how you might use these features in the real world. Say your workflow is like this 1. Do work on an website 2. Create a branch for the new story you're working on 3. Do some work in the new branch At this stage, you receive a call that another issue is critical and needs a hotfix. So you: 1. Switch to the production branch 2. Create a branch to add the hotfix 3. After testing, merge the hotfix branch, and push to production 4. Switch back to the original story and continue working 29

Now, let's go through a basic example of how you might use branching in the real world 29

So, let s say you re working on your project and have a couple of commits already. This is our project, we've got a few commits on the master branch. 30

You ve decided that you re going to work on issue #53 in whatever issue tracking system your company uses. To create a branch and switch to it at the same time, you can run the git checkout command with the -b switch. This is shorthand for: $ git branch iss53 $ git checkout iss53 31

You work on your web site and do some commits. Doing so moves the iss53 branch forward, because you have it checked out (that is, your HEAD is pointing to it). Now you get the call that there is an issue with the web site, and you need to fix it immediately. With git, you don t have to deploy your fix along with the iss53 changes you ve made, and you don t have to put a lot of effort into reverting those changes before you can work on applying your fix to what is in production. All you have to do is switch back to your master branch. One important note is that before you switch back, you need to ensure that your working area and staging area do not have any uncommitted changes that conflict with the branch you're switching to. If you do, git won't let you switch. So, it's best to have a clean working area when you switch. There are ways to get around it (like stashing your current changes or amending a previous commit), but, for now, just assume you have committed all your work, so you can switch back to the master branch. 32

OK, so you switch back to master. Remember, this adds, replaces, and modifies files so your working directory looks like it did on the last commit of your master branch. And now you want implement the hotfix. So, you create a hotfix branch that you can work in until the work is complete. Then, you do the work, and run some tests, and when you're satisfied, you commit it. 33

Now, to deploy the hotfix code to your master branch, you have to merge it back into your master branch. You do this with the merge command. So, first, switch back to master. And then use merge to merge the changes from hotfix into the master branch. Notice the phrase fast-forward in that merge. Because the commit pointed to by the branch you merged in was directly upstream of the commit you re on, git simply moves the pointer forward. Another way of saying that is that when you try to merge one commit with a commit that can be reached by following the first commit s history, git simplifies things by moving the pointer forward because there is no divergent work to merge together this is called a fast-forward. 34

After deploying the fix, you're ready to switch back and continue working on the issue you were working on before. Before you do that, since you've already fixed and merged the hotfix, you can go ahead and delete the hotfix branch. You do this with the d option on the branch command 35

Now you can switch back to your work-in-progress branch on issue #53 and continue working on it. So, you use checkout to switch back to the iss53 branch, and you make a few more changes and commit again. Now, it's important to note that the work you did in your hotfix branch is not contained in the files in your iss53 branch. If you need to pull it in, you can merge your master branch into your iss53 branch by running 'git merge master', or you can wait to integrate those changes until you decide to pull the iss53 branch back into master later. 36

So, let's say you ve decided that your issue #53 work is complete and ready to be merged into your master branch. In order to do that, you ll merge your iss53 branch into master, much like you merged your hotfix branch earlier. All you have to do is check out the branch you wish to merge into and then run the git merge command. So, you use the commands shown here. Switch to the master branch and merge iss53 into master. Now, this merge is a bit more complicated than the merge we did before. Notice that, in this case, your development history has diverged from some older, common ancestor. Because the commit on the branch you re on isn t a direct ancestor of the branch you re merging in, git handles this merge a bit differently. In this case, git does a simple three-way merge, using the two snapshots pointed to by the branch tips and the common ancestor of the two. 37

So, instead of just moving the branch pointer forward, git creates a new snapshot that results from this three-way merge and automatically creates a new commit (C6) that points to it. This is referred to as a merge commit, and is special in that it has more than one parent. It s worth pointing out that Git determines the best common ancestor to use for its merge base. this is different than older tools like CVS or Subversion (before version 1.5), where the developer doing the merge had to figure out the best merge base for themselves. This makes merging a heck of a lot easier in Git than in these other systems 38

Occasionally, this process doesn t go smoothly. If you changed the same part of the same file differently in the two branches you re merging together, Git won t be able to merge them cleanly 39

When a conflict occurs, git pauses the commit process, and does not create a merge commit. If you look at your status, you can see which files were unmerged. Anything that has merge conflicts and hasn t been resolved is listed as unmerged. 40

Git adds standard conflict-resolution markers to the files that have conflicts, so you can open them manually and resolve those conflicts. Your file contains a section that looks something like this: This means the version in HEAD (your master branch, because that was what you had checked out when you ran your merge command) is the top part of that block (everything above the =======), while the version in your iss53 branch looks like everything in the bottom part. 41

In order to resolve the conflict, you have to either choose one side or the other or merge the contents yourself. You can also use a graphical tool, such as mergetool, to help you resolve the conflicts. After you ve resolved each of these sections in each conflicted file, run git add on each file to mark it as resolved. Staging the file marks it as resolved in git. And then, you simply use commit to create the merge commit. 42

Next I want to talk about Git workflows You can think of a git workflow as a standard set of best practices for developing a project under git version control. The workflow will define things like how you will use branching and merging to add new features or implement bug fixes and also how to deploy your code from a stable branch in your repo. These workflows can be useful for development teams because depending on how you set them up they can be used to encourage development practices you might want for your project such as feature-driven development, code reviews, and continuous delivery of working software. Also, since these workflows define a standard set of practices, they can be applied across the entire development team or across different projects within an organization. Once you decide whichever workflow you like you can use it over and over again on your next projects without having to think about it. 43

There are a number of different possible git workflows that are suggested and documented online. For our class, I ll cover one of the simplest, called GitHub flow. Don t have to use this for your project but you can. Even if you don t use this particular workflow, I do encourage you to pick some standardized workflow for your project so you re not just arbitrarily branching and merging with no clear idea of which branches are stable and which are in development. This slide lists all the steps for the workflow. Let s go through each of these. The first rule says that anything you put into the master branch is deployable. The authors of GitHub flow say that this is the only hard rule in the workflow. So, basically, if you re using this workflow, the master branch always needs to be stable and it should be safe to deploy or create new branches off of it. If you put code into the master branch that doesn t work or breaks the build, you should feel bad. 44

The next rule says that when you want to start anything new be it a feature or a bug fix you should create a descriptively named branch off of master to do your development. This helps organize your project and allows you to see exactly what is currently being worked on in the repo by looking at the branches. For each branch, you should commit to that branch locally and regularly push your work to the same named branch on the server. So, basically, this rule encourages saving and committing your work often. As long as we re not committing to the master, we can commit code that might not work or hasn t been fully tested. Next, when you ve done some amount of work, and would like feedback from the rest of your team, or you think the branch is ready to merge, you should open what s known as a merge or pull request. This is essentially a tool within GitHub or GitLab that allows you to ask someone on your team for a review before merging the code. When that person feels comfortable with the changes, the code will be merged into the branch. In this workflow, you typically don t merge code directly into master, but rather ask someone else for sign-off before merging into master. Once someone else reviewed the feature you can go ahead and merge it into the master branch. And finally, once your feature has been pushed to the master branch, that code can be deployed immediately. 44