GIT SERVER PERFORMANCE OPTIMIZATION USING GIT-ANNEX MEOR NUR HASYIM BIN MEOR AZIZ BACHELOR OF COMPUTER SCIENCE

Similar documents
MICRO-SEQUENCER BASED CONTROL UNIT DESIGN FOR A CENTRAL PROCESSING UNIT TAN CHANG HAI

Version Control Systems

Comparison of Software Configuration Management Tools

Lab Exercise Git: A distributed version control system

Version Control. Second level Third level Fourth level Fifth level. - Software Development Project. January 17, 2018

Git. CSCI 5828: Foundations of Software Engineering Lecture 02a 08/27/2015

Git & Github Fundamental by Rajesh Kumar.

Version control. what is version control? setting up Git simple command-line usage Git basics

GIT. A free and open source distributed version control system. User Guide. January, Department of Computer Science and Engineering

Universiti Teknologi MARA. An Analysis on The Uses of Data Modeling in Database Application Development

BLOCK-BASED NEURAL NETWORK MAPPING ON GRAPHICS PROCESSOR UNIT ONG CHIN TONG UNIVERSITI TEKNOLOGI MALAYSIA

This item is protected by original copyright

Intro to Linux & Command Line

Version Control: Gitting Started

Version Control System GIT

Version Control. Version Control

Version Control Systems (VCS)

Introduction to Git and Github Repositories

Working with GIT. Florido Paganelli Lund University MNXB Florido Paganelli MNXB Working with git 1/47

THE COMPARISON OF IMAGE MANIFOLD METHOD AND VOLUME ESTIMATION METHOD IN CONSTRUCTING 3D BRAIN TUMOR IMAGE

Lecture 01 - Working with Linux Servers and Git

Version Control. Version Control

Version control CSE 403

CSCI 2132: Software Development. Norbert Zeh. Faculty of Computer Science Dalhousie University. Subversion (and Git) Winter 2019

Visualizing Git Workflows. A visual guide to 539 workflows

RETROPIE INSTALLATION GUIDE

Versioning with git. Moritz August Git/Bash/Python-Course for MPE. Moritz August Versioning with Git

Windows. Everywhere else

What is version control? (discuss) Who has used version control? Favorite VCS? Uses of version control (read)

S18 Modern Version Control with Git

Topics covered. Introduction to Git Git workflows Git key concepts Hands on session Branching models. Git 2

GETTING STARTED WITH. Michael Lessard Senior Solutions Architect June 2017

Overview. 1. Install git and create a Github account 2. What is git? 3. How does git work? 4. What is GitHub? 5. Quick example using git and GitHub

Embedded Linux. A Tour inside ARM's Kernel

Programming with Haiku

TDDC88 Lab 4 Software Configuration Management

Git tutorial. Katie Osterried C2SM. October 22, 2015

Version control CSE 403

GIT VERSION CONTROL TUTORIAL. William Wu 2014 October 7

HARDWARE-ACCELERATED LOCALIZATION FOR AUTOMATED LICENSE PLATE RECOGNITION SYSTEM CHIN TECK LOONG UNIVERSITI TEKNOLOGI MALAYSIA

Laboratorio di Programmazione. Prof. Marco Bertini

Revision Control. An Introduction Using Git 1/15

Version Control with Git

AIS Grid School 2015

CS 320 Introduction to Software Engineering Spring February 06, 2017

Lab 1 1 Due Wed., 2 Sept. 2015

Software Development I

Version Control with Git ME 461 Fall 2018

HIGH SPEED SIX OPERANDS 16-BITS CARRY SAVE ADDER AWATIF BINTI HASHIM

LOGICAL OPERATORS AND ITS APPLICATION IN DETERMINING VULNERABLE WEBSITES CAUSED BY SQL INJECTION AMONG UTM FACULTY WEBSITES NURUL FARIHA BINTI MOKHTER

Overview. during this tutorial we will examine how touse git from the command line. finally we will explore a graphical visualisation of git activity

Version Control with GIT: an introduction

Using git to download and update BOUT++

[Software Development] Development Tools. Davide Balzarotti. Eurecom Sophia Antipolis, France

CS 390 Software Engineering Lecture 5 More Git

Version Control. Software Carpentry Github s Hello World Git For Ages 4 And Up You need source code control now

Software Project (Lecture 4): Git & Github

Git Guide. Meher Krishna Patel. Created on : Octorber, 2017 Last updated : October, More documents are freely available at PythonDSP

About SJTUG. SJTU *nix User Group SJTU Joyful Techie User Group

FINGERPRINT DATABASE NUR AMIRA BINTI ARIFFIN THESIS SUBMITTED IN FULFILMENT OF THE DEGREE OF COMPUTER SCIENCE (COMPUTER SYSTEM AND NETWORKING)

Effective Software Development and Version Control

Git. SSE2034: System Software Experiment 3, Fall 2018, Jinkyu Jeong

EDA Spring, Project Guidelines

F17 Modern Version Control with Git. Aaron Perley

Hg Mercurial Cheat Sheet

Working in Teams CS 520 Theory and Practice of Software Engineering Fall 2018

AUTOMATIC APPLICATION PROGRAMMING INTERFACE FOR MULTI HOP WIRELESS FIDELITY WIRELESS SENSOR NETWORK

Object Oriented Programming. Week 1 Part 2 Git and egit

ENHANCEMENT OF UML-BASED WEB ENGINEERING FOR METAMODELS: HOMEPAGE DEVELOPMENT CASESTUDY KARZAN WAKIL SAID

API RI. Application Programming Interface Reference Implementation. Policies and Procedures Discussion

Apache Subversion Tutorial

CSC 2700: Scientific Computing

Git: Distributed Version Control

AMath 483/583 Lecture 2

Section 2: Developer tools and you. Alex Mariakakis (staff-wide)

HARDWARE AND SOFTWARE CO-SIMULATION PLATFORM FOR CONVOLUTION OR CORRELATION BASED IMAGE PROCESSING ALGORITHMS SAYED OMID AYAT

AMath 483/583 Lecture 2. Notes: Notes: Homework #1. Class Virtual Machine. Notes: Outline:

February 2 nd Jean Parpaillon

Git Tutorial. Version: 0.2. Anders Nilsson April 1, 2014

Agenda. - Final Project Info. - All things Git. - Make sure to come to lab for Python next week

FAQ Q: Where/in which branch do I create new code/modify existing code? A: Q: How do I commit new changes? A:

Triton file systems - an introduction. slide 1 of 28

Git better. Collaborative project management using Git and GitHub. Matteo Sostero March 13, Sant Anna School of Advanced Studies

THREE BIT SUBTRACTION CIRCUIT VIA FIELD PROGRAMMABLE GATE ARRAY (FPGA) NOORAISYAH BINTI ARASID B

Computer Science Design I Version Control with Git

Technology Background Development environment, Skeleton and Libraries

Git for Newbies. ComMouse Dongyue Studio

Using GitHub to Share with SparkFun a

CSE 391 Lecture 9. Version control with Git

CS 143A. Principles of Operating Systems. Instructor : Prof. Anton Burtsev

#25. Use Source Control in a Single- Developer Environment

Boot your computer into GNU/Linux. Overview. Introduction to git on the command line. Introduction to git on the command line

Version Control. Second level Third level Fourth level Fifth level. - Software Development Project. January 11, 2017

AUTOMATIC RAILWAY GATE CONTROLLERUSING ZIGBEE NURLIYANA HAZIRAH BINTI MOHD SAFEE (B )

UNIVERSITI MALAYSIA PAHANG

Version control with Git.

Adafruit's Raspberry Pi Lesson 6. Using SSH

DESIGN AND IMPLEMENTATION OF A MUSIC BOX USING FPGA TAN KIAN YIAK

DEAD-SIMPLE VERSION CONTROL FOR YOUR TEAM GIT WITH MATTHEW REIDSMA GRAND VALLEY STATE UNIVERSITY

Warmup. A programmer s wife tells him, Would you mind going to the store and picking up a loaf of bread? Also, if they have eggs, get a dozen.

Transcription:

GIT SERVER PERFORMANCE OPTIMIZATION USING GIT-ANNEX MEOR NUR HASYIM BIN MEOR AZIZ BACHELOR OF COMPUTER SCIENCE (COMPUTER NETWORK SECURITY) WITH HONOURS UNIVERSITI SULTAN ZAINAL ABIDIN 2018 I

GIT SERVER PERFORMANCE OPTIMIZATION USING GIT-ANNEX MEOR NUR HASYIM BIN MEOR AZIZ BACHELOR OF COMPUTER SCIENCE (COMPUTER NETWORK SECURITY) WITH HONOURS FACULTY OF INFORMATICS AND COMPUTING UNIVERSITI SULTAN ZAINAL ABIDIN AUGUST 2018 II

DECLARATION I declare that this thesis is based on my original work except for quotations and citations which have been duly knowledge. I also declare that it has not been previously or concurrently submitted for any other degree at UniSZA or any other institutions. (Meor Nur Hasyim Bin Meor Aziz) Date:... I

CONFIRMATION The project report of the following title Git Server Performance Optimization Using Git-annex submitted to the Faculty of Informatics and Computing, University Sultan Zainal Abidin by Meor Nur Hasyim Bin Meor Aziz (BTBL15040131) and has been found satisfactory in terms of scope quality and presentation as partial fulfilment of the requirements for the degree of Bachelor of Computer Science in Computer Network Security. (Dr Aznida Hayati Binti Zakaria@Mohamad) Date: II

ACKNOWLEDGEMENT Alhamdulillah. All thanks to Allah SWT whom with his willing giving me the opportunity to complete this thesis. Firstly, I would like to express my thanks to my supervisor, Dr Aznida Hayati Binti Zakaria@Mohamad, who guided and has supported me throughout this proposal with her patience and knowledge. Deepest thanks to my parents, family members and all my friends for their cooperation and encouragement from the beginning till the end. I also place on my record, my sense of gratitude to one and all, which directly or indirectly have lent their hand in this venture. III

ABSTRACT This project is to implement git-annex to improve the performance optimization of a git server. When a git server have too many files in the repository, it performance will be affected. In response, this will lead to slow productivity of developer s team that are utilizing the git server. By studying concepts of version control software and building one, it could help understanding the complexity of the version control software environment. This project wish could help the users that are utilizing version control software to implement git annex to improve their server performance. IV

ABSTRAK Projek ini akan mengaplikasikan git-annex untuk meningkatkan prestasi sebuah pelayan git. Apabila pelayan git mempunyai terlalu banyak data di dalam repository, prestasi menggunakannya akan mula merudum. Ini akan mengakibatkan penurunan produktiviti pasukan pembangun yang menggunakan pelayan git tersebut. Dengan mempelajari konsep perisian kawalan versi dan membinanya, diharap dapat memahami perisian kawalan versi tersebut dengan lebih mendalam lagi. Projek ini juga diharapkan dapat membantu pengguna yang mahu mengaplikasikan git-annex untuk meningkatkan prestasi pelayan git mereka. V

TABLE OF CONTENT DECLARATION CONFIRMATION ACKNOWLEDGEMENT ABSTRACT ABSTRAK TABLE OF CONTENT LIST OF FIGURES LIST OF TABLES LIST OF ABBREVIATIONS PAGE I II III IV V VI X XII XIV CHAPTER 1 INTRODUCTION 1.1 Project Background 1 1.2 Problem Statement 2 1.3 Objectives 2 1.4 Scope 2 1.5 Limitation of Work 3 1.6 Expected Result 3 1.7 Report Structure 3 CHAPTER 2 LITERATURE REVIEW 2.1 Introduction 4 2.2 Related Articles 4 2.2.1 WordPress Based on Git Implementation 4 2.2.2 Version Control Software using Virtual 5 Machine 2.2.3 Types of Version Control Software 6 2.2.4 Git Server Practices 6 2.2.5 Git-Annex 7 2.3 Chapter Summary 8 VI

CHAPTER 3 METHODOLOGY 3.1 Introduction 9 3.2 Project Framework 9 3.2.1 Phase 1: Installation Setup 10 3.2.2 Phase 2: Git Server Performance 11 Testing Process 3.2.3 Phase 3: Result Comparison and 12 Analysis 3.3 Hardware and Software Requirements 12 3.3.1 Hardware Requirements 12 3.3.2 Software Requirements 13 3.4 Chapter Summary 14 CHAPTER 4 IMPLEMENTATION & TESTING 4.1 Introduction 15 4.2 Installation Pre Requisite 15 4.2.1 Static IP 15 4.2.3 Putty 17 4.3 Phase 1 Project: Installation Setup 4.3.1 Installation of Git 4.3.1.1 Installation of Git on the Server 18 4.3.1.2 Installation of Git on the Local 19 Machine 4.3.2 Installation of Git-annex 19 4.3.2.1 Installation of Git-annex on the 19 Server 4.3.2.2 Installation of Git-annex on the 21 Local machine 4.4 Working on the Command 23 4.4.1 Basic Git Command 23 4.4.2 Basic Git-annex Command 24 VII

4.5 Phase 2 Project: Git Server Performance Testing 25 Process 4.5.1 Test Environment Setup 25 4.5.2 Performance Test Workflow 27 4.5.2.1 Pushing test 28 4.5.2.2 Pulling Test 28 4.5.2.3 Test Command 28 4.5.2.3.1 Standard Git Command 29 4.5.2.3.2 Git-annex Command 29 4.5.2.3.3 Linux Command 30 4.5.2.4 Test Subject 31 4.5.2.5 Test Workflow 32 4.5.3 Pushing and Pulling Performance Test 33 4.5.3.1 Standard Git Pushing 34 Performance Test 4.5.3.2 Standard Git Pulling 37 Performance Test 4.5.3.3 Git-annex Pushing Performance 39 Test 4.5.3.2 Git-annex Pulling Performance 43 Test 4.5.3.3 Pushing and Pulling Performance 46 Test Overview 4.6 Phase 3: Results Comparison and Analysis 48 4.6.1 Standard Git Setup Pushing Performance 48 Test Results 4.6.2 Standard Git Setup Pulling Performance 50 Test Results 4.6.3 Git-annex Setup Pushing Performance 53 Test Results 4.6.4 Git-annex Setup Pulling Performance 55 Test Results VIII

4.7 Graph Visualization Based On Results 58 4.7.1 Average Pushing Real Time Test Graph 58 4.7.2 Average Pulling Real Time Test Graph 59 4.7.3 Average Pushing CPU Time Test Graph 60 4.7.4 Average Pulling CPU Time Test Graph 61 4.8 Chapter Summary 62 CHAPTER 5 CONCLUSION 5.1 Introduction 63 5.2 Project Findings and Results Analysis 63 5.3 Project Constraints and Limitation 65 5.4 Future Work and Recommendation 66 5.5 Chapter Summary 67 REFERENCES 68 APPENDICES 70 IX

LIST OF FIGURES FIGURE TITLE PAGE 3.1 Project Framework 10 3.2 Performance Test Result Example 11 4.1 Editing the dhcpcd.conf File 16 4.2 PuTTY Interface 18 4.3 Initializing Git in the Server 19 4.4 Creating gitclassic Directory 19 4.5 Initializing Git in the Local Machine 19 4.6 Installing Git-annex on the Server 20 4.7 Initializing Git on the Server 20 4.8 Initializing Git-annex on the server 20 4.9 Checking annex directory 21 4.10 Initializing Git on the Local Machine 21 4.11 Initializing Git-annex on the Local Machine 22 4.12 Checking Annex directory on Linux Ubuntu 22 4.13 Test Environment Setup 27 4.14 Time Command Output 30 4.15 List of Test Files 32 4.16 Test Workflow 33 4.17 Copying 10MB File 34 4.18 Adding 10MB File to the Git Repository 34 4.19 Committing Message for File Changes 35 X

4.20 Adding Connection to the Server 35 4.21 Pushing the File to the Server 36 4.22 Creating Directory and Initializing Git 37 4.23 Adding Connection to the Server 38 4.24 Pulling Test Output 38 4.25 Copying 10MB File 40 4.26 Adding File to the Git-annex Repository 40 4.27 Committing Message to File Changes 40 4.28 Adding Connection to the Server 41 4.29 Pushing Test for Git-annex Output 42 4.30 Creating Directory and Initializing Git 43 4.31 Adding Connection to the Server 44 4.32 Adding Connection to the Server 44 4.33 Pushing and Pulling Performance Test Overview 46 4.34 Average Pushing Real Time Test Graph 58 4.35 Average Pulling Real Time Test Graph 59 4.36 Average Pushing CPU Time Test Graph 60 4.37 Average Pulling CPU Time Test Graph 61 XI

LIST OF TABLES TABLE TITLE PAGE 3.1 List of hardware requirements for the server 12 3.2 List of hardware requirements for the local machine 13 3.3 List of software requirements 13 4.1 Standard git setup pushing test results for 10MB file size 48 4.2 Standard git setup pushing test results for 20MB file size 48 4.3 Standard git setup pushing test results for 50MB file size 49 4.4 Standard git setup pushing test results for 100MB file size 49 4.5 Standard git setup pushing test results for 200MB file size 50 4.6 Standard git setup pulling test results for 10MB file size 50 4.7 Standard git setup pulling test results for 20MB file size 51 4.8 Standard git setup pulling test results for 50MB file size 51 4.9 Standard git setup pulling test results for 100MB file size 52 4.10 Standard git setup pulling test results for 200MB file size 52 4.11 Git-annex setup pushing test results for 10MB file size 53 4.12 Git-annex setup pushing test results for 20MB file size 53 4.13 Git-annex setup pushing test results for 50MB file size 54 4.14 Git-annex setup pushing test results for 100MB file size 54 4.15 Git-annex setup pushing test results for 200MB file size 55 4.16 Git-annex setup pulling test results for 10MB file size 55 4.17 Git-annex setup pulling test results for 20MB file size 56 XII

4.18 Git-annex setup pulling test results for 50MB file size 56 4.19 Git-annex setup pulling test results for 100MB file size 57 4.20 Git-annex setup pulling test results for 200MB file size 57 XIII

LIST OF ABBREVIATIONS VCS Version Control Software SVN Subversion (Apache) OS Operating System XIV

CHAPTER 1 INTRODUCTION 1.1 Project Background Git is a type of version control software that has the ability to manage the modification and configuration of an application and also keep tracks of them [6]. Version control is important to software developers because developers can compare files, identify differences, and merge the changes if needed. When the troubleshooting of the code has an issue, developers can compare the last working file with the faulty one, decreasing the time spent identifying the cause of the issue. There are many types of version control software such as Mercurial, Git and Apache Subversion [3]. Git-annex is a distributed file synchronization that aims to solve the problem of sharing and synchronization collections of large files [7]. It is installed in a git server to help and improve performance slowdown that is caused by large file storage stored in the server. Git-annex runs in the background to automate the synchronization of repositories. 1

1.2 Problem Statement Git server is very important for developer team because it helps them manage the changes that had been done through the code. Therefore performance issue should not to be lightly taken as any slowdown will affect the team s productivity. As more code is being stored in the git server, it will causing performance problem. As a solution, git-annex will be used as a technique to address this problem. Therefore to achieve this, a version control software is needed to build. This project also requires effort in studying the concept of version control software to find the suitable version control software platform. By using Raspberry Pi, the nature of open source type of software can be utilized including the operating system involved, and also the version control software itself. It also keeps the project s cost low, but still enough to meet the project s goals and demand. 1.3 Objectives 1. To study the concept of version control software server. 2. To develop a software version control server based on Git. 3. To implement git-annex as the solution to improve the performance of a git server. 1.4 Scope The scope in this project that will be covered: 1. Git The only version control software that will be used. 2. Git-annex As the solution to improve the git server performance 3. Raspberry Pi Acts as the physical hardware of the git server. 2

1.5 Limitation of Work The limitation of this project: 1. Raspberry Pi have to be monitored properly to ensure smoothness of this project otherwise it will affecting the results and output of the project. 1.6 Expected Result 1. Working Git server running on Raspberry Pi. 2. Git-annex will give git server a performance improvement after being implemented. 1.7 Report Structure The first chapter of this report is the introduction to the projects which includes introduction, problem statements, objectives and scope for this project. The overall logic of the project is stated here. The second chapter is literature review. This chapter provides understanding based on the explanation about the related research done. The third chapter describe the methodology used in this project. 3

CHAPTER 2 LITERATURE REVIEW 2.1 Introduction This chapter will discuss related paper research, articles and also related techniques or methods. From this literature review, a bigger picture is obtained so that the project understanding and knowledge can be gain more. For this project, paper research related to version control software, different types of them, architectures, and tools to improve the performance of the version control software will be listed. 2.3 Related Articles 2.2.1 WordPress Based on Git Implementation This thesis presents preparations and good practices for independent web development [1]. The thesis does not have a client, but the end product of the thesis can be used for assignments by actual clients. The aim of the thesis is to prepare two computers as web servers, and to present version control software Git. With Git the thesis should cover most basic usage and operations, mainly 4

relating to web development. At the end of the thesis web servers meant to be used in development are ready for use and Git version control is in use. Git version control includes clean installation of content management system WordPress. GitHub is used as the service provider for the Git server. Version controlled WordPress base can be moved on any Debian based web server, which fulfils the requirements of WordPress. By following the instructions of this thesis, creation of a web server should be possible, as well as use of Git. This thesis leaves room for further development. The thesis could be continued by creating the actual website, for which the preparations are made for, and by automating some of the actions presented in the thesis. 2.2.2 Version Control Software using Virtual Machine The objective of this research is to design and implement an improved control version software server [2]. A Virtual Machine is installed as the basic setup for the project. Also mention that using Virtual Machine, it is possible to create technically difficult and advanced setups without interfering with the current server environment. This is because the Virtual Machine allows to develop the server environment in stages and it can be tested outside the server environment. This article also mention that Virtual Machine us plenty of system resources which may result in problems later. Alternatively, one may use application container to optimize system resources aside from Virtual Machine. 5

2.2.3 Types of Version Control Software This paper discusses three type of version control software. The version control software involves in the paper is Mercurial, Git, and Apache Subversion (SVN) [3]. The paper further discusses commands used in each type of version control software. Also mentioned in the paper that Git is a distributed revision control and source code management system with an emphasis on speed. Also there are explanation to Git basic terminologies like trees, commits, branches, clone, pull, push and revision. Regularly individuals get confounded thinking Git and Github are one and the same; however there exists real contrasts amongst them. In spite of the fact that you can run your own particular Git server locally, Github is a remote server, a group of designers, and a graphical web interface for dealing with your Git project. It s free to use for up to 5 public repositories. GitHub has as of late settled Git as a great version control system, giving a wonderful front end to numerous substantial projects, for example, Rails and Prototype. 2.2.4 Git Server Practices This paper research mentioned examples of best practices for a git server to run efficiently [4]. The best practices are some general advice for effective software development using version control. One of them is to explain commits completely. Every version control tool provides a way to include a log message (a comment) when committing changes to the repository. This comment is important. If we consistently use good comments when we commit, our repository s history contains not only every change we have ever made, but it also contains an explanation of why those changes happened. These kinds of 6

records can be invaluable later as we forget things. Other practices include to build and test the code after every commit. 2.2.5 Git-Annex This paper mentioned tool called Git-Annex that was developed to address some problems of storing large files in git repositories and allow for controlled transfers of restricted datasets [5]. This tool uses a storage format similar to git, but it keeps the large object store separately from the regular gitobject store. The storage format of git-annex has been modified to allow handling of very large files on the order of multiple terabytes. However the gitannex tools still maintains full metadata related to the large files in the main git repository so git still can record all manipulations done to these large files as well as maintain the integrity of the repository through the use of checksums. This is achieved by git-annex checking-in the links to the hashes of the large files into the git repository. Such scheme also allows research groups at different organizations to share the code used for processing the datasets without sharing the datasets themselves, and at the same time both parties can obtain the same dataset from the proper authority and independently inject it into their repository. All of this could be achieved through standard git commands without tedious and error prone verifying of long checksums. 7

2.3 Chapter Summary This chapter discusses the related literature review that had been reviewed during feasibility studies. The literature review helps developer to discover the problem of previous technique or approach which needs to be improve and overcome in this project development. Furthermore, it also helps to gain understanding about the project that undergo the development process. 8

CHAPTER 3 METHODOLOGY 3.1 Introduction This chapter will discuss the methodology used throughout the project. To ensure the project works efficiently as planned, the project will have to go through each phase of appropriate methodology. 3.2 Project Framework The project will have to undergo different processes so that the analysis at the end of the project can be done. So, framework need to be designed in order to see and understand the flow of the process ensuring it can be done as proposed. The methodology framework is divided into 3 main phase: - a) Phase 1: Raspberry Pi Setup b) Phase 2: Git Server Performance Testing Process c) Phase 3: Result Comparison and Analysis 9

Git Server Performance Testing Process Installation setup Git server implementation Phase 1 Phase 2 Phase 3 Git annex implementation Performance testing Produce result Result comparison and analysis Figure 3.1: Project Framework 3.2.1 Phase 1: Installation Setup Installation setup is the first phase of the project. Raspbian OS will be installed as the Raspberry Pi operating system. Git will be installed in the server and local machine. Git-annex also will be installed in the server and the local machine. 10

3.2.2 Phase 2: Git Server Performance Testing Process The phase is proceeded with testing the performance test. There are two varied performance test for the git server. First performance test is to look at the git server without implementing git-annex. The performance test is based on the time taken of the standard git and also the git-annex setup to upload and download files from the server. The example of the performance test is as below: Figure 3.2: Performance Test Result Example The result of the performance test will be kept and recorded. The second test is to look at performance of the git server with implemented git-annex. Git-annex is installed while keeping the same Raspbian operating system, and system setting like previous test. The git-annex implemented git server will undergo the same performance test. Result will be kept and recorded for the use of Phase 3. Both test will be run 5 times with 5 different file sizes until average values is obtained. This is to ensure the consistency of the result of the performance test. 11

3.2.3 Phase 3: Result Comparison and Analysis In this phase, result of the performance of git-annex implemented and without it will be collected for the project analysis. In the analysis phase, both result of the performance test will be compared. In comparing both the results, the values from the results will be the deciding factor to conclude that whether git-annex can improve the performance of the git server or not. 3.3 Hardware and Software Requirements This section will show the list of all hardware and software that involve in the development process of this project. All of these elements are crucial and important in this process. 3.3.1 Hardware Requirements List of hardware used for the server: No. Hardware Type 1. Computer Raspberry Pi Model B 2. Processor Broadcom 2837 3. Memory 1 GB RAM 4. Operating System Type Raspbian OS 5. Operating System Storage 16 GB Samsung SD Card 6. Repository Storage 16 GB Sandisk Pendrive 7. System Type 64 -bit Table 3.1 List of hardware requirements for the server 12

List of hardware used for the local machine: No. Hardware Type 1. Computer MSI CX61 Laptop 2. Processor Intel Core i7-4712mq 3. Memory 3 GB RAM 4. Operating System Type Linux Ubuntu by Vmware Pro Workstation 5. Operating System Storage 20 GB Samsung SSD 6. Repository Storage 20 GB Samsung SSD 7. System Type 64 -bit Table 3.2 List of hardware requirements for the local machine 3.3.2 Software Requirements List of software used: No. Software Type 1. Raspbian OS Operating System for the project. 2. Git Control version software used. 3. Git-annex Software to improve performance optimization of git server. 4. Microsoft Word 2013 Used to prepare report documentation. 5. VMware Pro Workstation Host the Linux Ubuntu for the local machine. 6. PuTTY Used to remotely access server from a distance. Table 3.3 List of software requirements 13

3.4 Chapter Summary This chapter can be concluded that the whole phases and processes explained is carefully planned so that the aim of this project can be achieved. The detail explanation and the output of this project will be further explained in the next chapter. 14

CHAPTER 4 IMPLEMENTATION AND TESTING 4.1 Introduction This chapter will look into the installation of the git in the server and the local machine, the basic git related commands, the workflow of doing the performance of the test with full tutorial of it and also full results of the performance test. Based on the results, a graph will be shown to directly comparing the performance between the standard git and git-annex. 4.2 Installation Pre Requisite 4.2.1 Static IP A static Internet Protocol (IP) address is a number that is permanently assigned to a computer by an Internet server provider (ISP). Static IP address is 15

also known as fixed address. A computer that are assigned with a static IP address uses the same IP address every time when connecting to the Internet. Using static IP address for the server is very important in doing this project. Every time the local machine wants to connect to the server, assigning IP static address to the server allowing the local machine to enter only one constant value of IP address. If the server is not assigned with a static IP address, there is always a probability that the local machine cannot access the Raspberry Pi s server since the Raspberry Pi use different IP address. Assigning the static IP address for the Raspberry Pi is a simple step. Opening the dhcpcd.conf file, we can assign any static IP that we want. To edit this file, use the command in the Raspbian operating system. sudo nano /etc/dhcpcd.conf Figure 4.1: Editing the dhcpcd.conf File 16

4.2.3 PuTTY PuTTY is a free and open source terminal emulator software. It provides user control over SSH encryption key so users can remotely access computers over the internet. Related to this project, PuTTY is used to access the Raspbian operating system in the Raspberry Pi without having to connect to a monitor. PuTTY is installed in the Windows operating system on the local machine. Using PuTTY keeps the project interface in one place, meaning that we can access the Raspberry Pi and also can access the virtual machine in one screen only. Using PuTTY is pretty simple. Firstly, we must know the IP address of the computers that we want to access and control. Enter the computer s IP address in the Host Name (or IP address) space given. Tick the SSH part because it is the port that will be used for this type of connection. Click open and we are good to go. 17

Figure 4.2: PuTTY Interface FIGURE 4.2 shows a PuTTY interface installed in Windows operating system. 4.3 Phase 1 Project: Installation Setup 4.3.1 Installation of Git 4.3.1.1 Installation of Git on the Server The server will be installed on the Raspberry Pi. In the SSH Putty to your Raspberry Pi, go into the directory usbdrv for the git repository server. Initialize your git server. git init --bare 18

Figure 4.3: Initializing Git in the Server 4.3.1.2 Installation of Git on the Local Machine Step 1: Create a desired directory for the git repository. We will named this directory as gitclassic. mkdir gitclassic Step 2: Go into the directory. cd gitclassic Figure 4.4: Creating gitclassic Directory Step 3: Initialize git repository in the directory. git init Figure 4.5: Initializing Git in the Local Machine 4.3.2 Installation of Git-annex 4.3.2.1 Installation of Git-annex on the Server Step 1: Install git annex in the Raspbian operating system. 19

sudo apt-get install git-annex Figure 4.6: Installing Git-annex on the Server Step 2: Initialize git repository server in a desired directory git init --bare Figure 4.7: Initializing Git on the Server Step 3: Initialize git-annex in the same repository git annex init Figure 4.8: Initializing Git-annex on the server 20

Step 4: Check annex directory to make sure git-annex is successfully installed in the git repository. ls Figure 4.9: Checking annex directory 4.3.2.2 Installation of Git-annex on the Local machine Git-annex needs an initialized git command or git-annex related command will not work at all. So, a particular directory needs to be created and initialize git in it. Step 1: Install git-annex in the operating system. sudo apt-get install git-annex Step 2: Create a preferred directory for the git-annex git repository. The directory will be named gitannex. Initiliaze git in the directory. mkdir gitannex && cd gitannex git init Figure 4.10: Initializing Git on the Local Machine 21

Step 3: Initialize git-annex git annex init Figure 4.11: Initializing Git-annex on the Local Machine Opening up the directory through file manager in Linux Ubuntu, a folder named annex can be seen under gitannex directory indicating a successfully git-annex installation in a git repository. Figure 4.12: Checking Annex directory on Linux Ubuntu 22

4.4 Working on the Command 4.4.1 Basic Git Command There are many git command which has been used to configure and navigate along the git ecosystem. These are the commands that are the most common being used. 1. git init Create a new local repository in a directory. 2. git add. Add files changes in your working directory to your index. 3. git commit Commit any changes. Giving out comment so other user can see your changes. 4. git push Send changes to the master branch of remote repository. 5. git status Listing files that have changed and files that you still need to commit. 6. git remote Connect your local repository to a remote repository so you can push and pull files from the remote repository. 23

7. git pull Fetch and merge files changes on remote repository to your local working directory. 8. git reset Resets index and working directory to the state of last commit. 9. git clone Create a repository copy from a remote source. 10. git fsck An integrity check of the git file system that will identify corrupted objects. 4.4.2 Basic Git-annex Command Many of the git-annex command cannot be used on a standard git repository without git-annex because of the different file system structure. These are the most common used git-annex. 1. git annex init Initialize git-annex features into the git repository. 2. git annex add. Adding the files into the annex repository. 24

3. git annex get. Use the command when you need the content of the file. 4. git annex sync Syncs the metadata of your files that is stored in git to the remote repository. 5. git annex sync content Syncs the contents of files that is stored in git to the remote repository. 4.5 Phase 2 Project: Git Server Performance Testing Process 4.5.1 Test Environment Setup The setup of the project consists of only one Raspberry Pi 3 Model B and a laptop. The Raspberry Pi will become the server while the laptop will act as the local machine. For the server, there are two interchangeable mode (with and without git-annex). The git repository in the server needs to be deleted in order to execute the next test. For the local machine, two variants of git repository, one without the git-annex, and another with the git-annex will be installed in two separate Linux Ubuntu in the same VMware Pro Workstation. The two Linux Ubuntu on the VMware Pro Workstation need the same parameter of setup to ensure the consistency of both results from the same type of test. 25

The parameter setup of each of Linux Ubuntu is fixed as below: 1. 3GB of RAM 2. 20GB of single disk storage 3. Same location of data storage in the SSD of the laptop. 4. 4 numbers of processors allocated. 5. 1 core of processor of the laptop allocated to the VMware Pro Workstation. Most of the parameter of the Linux Ubuntu needs to be considered since performance of the git system is heavily depended on the hardware of it. The results of both test may be vary if we use different parameter of hardware for Linux Ubuntu. So, it is worth noting the parameter setup since the results will be referred back to them. Since we only use one Raspberry Pi with two interchangeable usage, we does not need to consider the hardware parameter. The most important thing for the server is delete all git repository data related on the server whenever the next test will be performed. 26

Figure 4.13: Test Environment Setup Figure 4.3 shows how the overall environment of the test will be set up. There are only one Raspberry Pi involved in the project. So, the way the test is done is standard git test is conducted first. Then, git-annex is installed and will be proceed to its own test. 4.5.2 Performance Test Workflow Onto the performance test workflow, there will be two types of test involving this project. The two main types of test are: 1. Pushing test 2. Pulling test 27

4.5.2.1 Pushing test Pushing test is the test where files will be added into the git repository and then upload them to the git repository server. 4.5.2.2 Pulling Test Pulling test is the test where files from the server that have been uploaded will be downloaded back to the local git repository. Both test will examine how fast a file will be written from the source to the destination. The time taken for the each test to be completed will be recorded for the results analysis and comparison. Since the first results of pushing may be vary from the second one, we will perform the test five times to get an average value of the time taken to complete each test. Each test will use different sets of command. Both variant standard git setup and git-annex setup will undergo these two test. This means there are four types of performance test specifically. The four specific test are: 1. Standard git pushing test 2. Standard git pulling test 3. Git-annex pushing test 4. Git-annex pulling test 28

4.5.2.3 Test Command Standard git setup and git-annex setup needs different types of command for them to undergo the test. The commands still have the same objective that is to send the file across the server and local even though the commands are different from each other. 4.5.2.3.1 Standard Git Command For standard git setup, the commands that will be used to push the file from the local git repository to the git server is: git push Another command which is used to pull the files from the server to the local machine is: git pull 4.5.2.3.2 Git-annex Command For the git-annex setup, it will only need one type of command to test both push and pull test. This command cannot be used on standard git setup because it does not recognize them since git annex is not installed. git annex sync content This command used to make sure the content files on the server is synchronize with the local machine. 29

4.5.2.3.3 Linux Command A specific Linux command is combined with the git command to get the value of time taken for the test to be completed so that the system can measure how long the process is executed. The Linux command is time and helps the project to measure exactly timing of the test when the files are being pushed or pulled. An example of time command output: Figure 4.14: Time Command Output The real value is the human time where the whole process that will to be executed in real time is measured. The user value is the CPU time in user mode where the amount of time the CPU worked on the process. The sys indicates the value of CPU time in kernel mode where it is the amount of CPU time worked on the operating system function related to the command. Such example of this are memory allocation or accessing hardware throughout the operating system. 30

4.5.2.3 Test Subject The project needs a specified fixed sets of files that will undergo the same progress across all the test performed. For the consistency of the results, five different sizes of files will be chosen for the test. The five different sizes of files is needed to show how different variant sizes of files will give impact on the performance of the git repository. The five different sizes of files is: 1. 10MB 2. 20MB 3. 50MB 4. 100MB 5. 200MB The files is located in the Downloads directory of both Linux Ubuntu of standard git and git-annex setup. Each files will be copied to the git repository to be pushed from the server. The same file also will be pulled back from the server to the local machine. 31

Figure 4.15: List of Test Files Figure 4.15 shows the five different files located in the Downloads directory. 4.5.2.4 Test Workflow The five different sizes of files will undergo the same test whether it be the standard git setup or the git annex setup. So, a workflow framework will be developed to give an overview how the test for the files to be carried out across the project. 32

Figure 4.16: Test Workflow Figure 4.16 shows how the test will be performed. Any file that will be tested will be added to the git repository in the Linux Ubuntu. The file then will be pushed to the server using the command git push for standard git setup or the command git annex sync content for the gitannex setup. Next, after the file successfully has been uploaded, it will perform the second test which the local will try to download it the file back from the server. 4.5.3 Pushing and Pulling Performance Test This section will explain the procedure and the steps involved in doing the test. The test is preceded with pushing test in standard git setup first, then pulling test in standard git setup, pushing test in git-annex setup and finally 33

pulling test in git-annex setup. Since the installation of git in a server and local machine has been discussed, the report will proceed to how the files will be pushed and pulled throughout the git server and local machine. 4.5.3.1 Standard Git Pushing Performance Test After installing the git in the server and local machine as in 4.3.1 Installation of Git, the git in the local machine will have to add the specified file test in the Download directory in the Linux Ubuntu (refer 4.5.2.3 Test Subject for in detail). Step 1: Copy test files into the git local machine repository. Copy the test file in the Download directory and paste it into depth gitclassic directory. The project will initiate the test with a 10MB file size first. cp /home/meor/downloads/10mb.zip. Figure 4.17: Copying 10MB File Step 2: Add the test file in the git repository. git add. Figure 4.18: Adding 10MB File to the Git Repository 34

Step 3: Add initial commit to the file changes so it can be pushed into the server. git add. Step 4: Add message into the commit where the comment can be written into the space between. If this step is skipped, the file cannot be pushed at all. git commit m message Figure 4.19: Committing Message for File Changes Step 5: Add connection of the local machine to the server Raspberry Pi. This command has to specific for your server name gitclassic, the ip of the Raspberry Pi 192.168.34.31 and the location of the git server repository. The configuration of connection between the local machine and the server can be checked using the second command below. git remote add gitclassic pi@192.168.43.31:/home/pi/usbdrv git remote -v Figure 4.20: Adding Connection to the Server 35

Step 6: Push and send the file to the git repository server in Raspberry Pi. The file will be pushed to the master branch of the git repository on the server. Combine the git command for push with Linux time command so the time taken for the process to be completed can be recorded. time git push gitclassic master Figure 4.21: Pushing the File to the Server Step 7: Record down the time taken for the test to be completed. Repeat this standard git pushing step 4 more times to get an average value of 10MB file size. For other size, 20MB, 50MB, 100MB and 200MB, change the Step 1 into preferred size test and then test for each file for five times too. In the end of pushing test, there are a total of 25 tests comprises of five 10MB size, five 20MB size, five 50MB size, five 100MB size and five 200MB size of file. The results of all these tests will be shown in 4.6.1 Standard Git Setup Pushing Performance Test Results 36

4.5.3.2 Standard Git Pulling Performance Test For pulling test, an empty git repository needed to be created first. Then, the file will be pulled towards the empty git repository from the server. This test is a continuation from the pushing test, means that the file needs to be pushed in the first place in order to pull it back to the local machine. Since the push test procedure is pushing 10MB file size, this pull test will show the steps on how to retrieve the same 10MB file size. The steps doing this test is as below. Step 1: Create an empty git repository. The empty git repository will be created under a different directory in the Linux Ubuntu. For the test, the directory will be named gitest. mkdir gitest && cd gitest git init Figure 4.22: Creating Directory and Initializing Git Step 2: Add connection of the local machine to the server Raspberry Pi. The connection of the server will be named gitest. Check and confirm the connection to the server. git remote add gitest pi@192.168.43.31:/home/pi/usbdrv git remote -v 37

Figure 4.23: Adding Connection to the Server Step 3: Pull and download the 10MB file size back to the empty git repository on the local machine. The file that will be pulled will go into the branch master of the empty git repository. Combine the git pull test command with Linux time command to get the time taken for the test to be completed. time git pull gitest master Figure 4.24: Pulling Test Output 38

Step 4: Record down the time taken for the test to be completed. Repeat this standard git pulling step four more times to get an average value of 10MB file size. For other size, 20MB, 50MB, 100MB and 200MB, push the file first into the server using steps from 4.5.3.1 Standard Git Pushing Performance Test and the repeat the all the same steps in Step 1-4 five times. In the end of pulling test, there are a total of 25 tests comprises of five 10MB size, five 20MB size, five 50MB size, five 100MB size and five 200MB size of file. The results of all these tests will be shown in 4.6.2 Standard Git Setup Pulling Performance Test Results. 4.5.3.3 Git-annex Pushing Performance Test After installing the git-annex in the server and the local machine as in 4.3.2 Installation of Git-annex, the git in the local machine will have to add the specified file test in the Download directory in the Linux Ubuntu (refer 4.5.2.3 Test Subject for in detail). Step 1: Copy test files into the git local machine repository. Copy the test file in the Download directory and paste it into the gitannex directory. The project will initiate the pulling test with a 10MB file size first. 39

cp /home/meor/downloads/10mb.zip. Figure 4.25: Copying 10MB File Step 2: Add the test file in the git repository. git annex add. Figure 4.26: Adding File to the Git-annex Repository Step 3: Add initial commit to the file changes so it can be pushed into the server. git add. Step 4: Add message into the commit where the comment can be written into the space between. If this step is skipped, the file cannot be pushed at all. git commit m message Figure 4.27: Committing Message to File Changes 40

Step 5: Add connection of the local machine to the server Raspberry Pi. This command has to specific for your server name gitannex, the ip of the Raspberry Pi 192.168.34.31 and the location of the git server repository. The configuration of connection between the local machine and the server can be checked using the second command below. git remote add gitannex pi@192.168.43.31:/home/pi/usbdrv git remote -v Figure 4.28: Adding Connection to the Server Step 6: Push and send to the server in Raspberry Pi. Note that in gitannex setup, command push will only pushes the metadata content of the file. The metadata is there that will act as a pointer to the actual location of the content. To actually push the content file to the server, sync command is used where the files in local and server will be synchronized together basically pushing what is not there at the server. This command is limited only to git-annex setup. Executing this command in a standard git setup will resulting to nothing happens. Combine the sync command with the Linux time command to measure the time taken for the test to be completed. The sync command has quite a lengthy output, so keep that in mind. 41

git push gitclassic master time git annex sync --content Figure 4.29: Pushing Test for Git-annex Output Step 7: Record down the time taken for the test to be completed. Repeat this git-annex pushing step 4 more times to get an average value of 10MB file size. For other size, 20MB, 50MB, 100MB and 200MB, change the Step 1 into preferred size test and then test for each file for five times. In the end of git-annex pushing test, there are a total of 25 tests comprises of five 10MB size, five 20MB size, five 50MB size, five 100MB size and five 200MB size of file. The results of all these tests will be shown in Git-annex Setup Pushing Performance Test Results. 42

4.5.3.2 Git-annex Pulling Performance Test For pulling test, an empty git repository with git-annex needed to be created first. Then, the file will be pulled repository from the server. This test is a continuation from the git-annex pushing test, means that the file needs to be pushed in the first place in order to pull it back to the local machine. Since the push test procedure is pushing 10MB file size, this pull test will show the steps on how to retrieve the same 10MB file size. The steps doing this test is as below. Step 1: Create an empty git repository. The empty git repository will be created under a different directory in the Linux Ubuntu. For the test, the directory will be named gito. Git-annex will be initialized mkdir gito && cd git init git annex init Figure 4.30: Creating Directory and Initializing Git Step 2: Add connection of the local machine to the server Raspberry Pi. The connection of the server will be named gito. Check and confirm the connection to the server. git remote add gito pi@192.168.43.31:/home/pi/usbdrv git remote -v 43

Figure 4.31: Adding Connection to the Server Step 3: Pull the 10MB file size back to the empty git repository on the local machine. The way this step is done is using the sync command. Since the sync command synchronize the content of the repository between local machine and the server, it basically downloading what files is missing from the local machine. The command pull from the standard git test before is not used since the sync command can pushes file as well as pull it back too. Combine the sync command with Linux time command to get the time taken for the test to be completed. time git annex sync --content Figure 4.32: Adding Connection to the Server 44

Step 4: Record down the time taken for the test to be completed. Repeat this git-annex pulling test step four more times to get an average value of 10MB file size. For other size, 20MB, 50MB, 100MB and 200MB, push the file first into the server using steps from 4.5.3.3 Git-annex Pushing Performance Test and the repeat the all the same steps in Step 1-4 five times. In the end of pulling test, there are a total of 25 tests comprises of five 10MB size, five 20MB size, five 50MB size, five 100MB size and five 200MB size of file. The results of all these tests will be shown in 4.6.4 Git-annex Setup Pulling Performance Test Results. 45

4.5.3.3 Pushing and Pulling Performance Test Overview carried out. This chapter visualizes how the overall performance of the test is 3 1 2 4 5 Figure 4.33: Pushing and Pulling Performance Test Overview Figure 4.33 shows the flow of the test from the start of pushing on the standard git setup until the very last pull test of a git-annex setup.the explanation for the overview of the pushing and pulling performance test is as below: 1. Standard git pushing test with 10MB, 20MB, 50MB, 100MB and 200MB file size with each test is done exactly five times. Totalling a total of 25 pushing test results. 2. Standard git pulling test with 10MB, 20MB, 50MB, 100MB and 200MB file size with each test is done exactly five times. Totalling a total of 25 pulling test results. 46

3. Deletion of standard git data on the server part because git-annex need to be installed to proceed to git-annex performance test later on. 4. Git-annex pushing test with 10MB, 20MB, 50MB, 100MB and 200MB file size with each test is done exactly five times. Totalling a total of 25 pushing test results. 5. Git-annex pulling test with 10MB, 20MB, 50MB, 100MB and 200MB file size with each test is done exactly five times. Totalling a total of 25 pulling test results. There are a total of 100 result test altogether. These results will be shown in 4.5.3.4 Pushing and Pulling Performance Test Results. Analysis and discussion will be explained later throughout this report. 47

4.6 Phase 3 Project: Results Comparison and Analysis This section will show the 100 test results that have been performed. All results is genuine based on the procedure and workflow of doing the test as discussed previously. 4.6.1 Standard Git Setup Pushing Performance Test Results For 10MB file size: Real User System First 39.659s 0.363s 0.196s Second 52.274s 0.418s 0.176s Third 32.425s 0.463s 0.083s Fourth 30.531s 0.394s 0.167s Fifth 37.288s 0.439s 0.142s Average 38.4354s 0.4154s 0.1528s Table 4.1 Standard git setup pushing test results for 10MB file size For 20MB file size: Real User System First 1m 5.108s 0.320s 1.025s Second 2m 18.862s 0.830s 0.221s Third 2m 11.707s 0.840s 0.232s Fourth 52.063s 0.826s 0.153s Fifth 50.674s 0.868s 0.111s Average 87.6828s 0.7368s 0.3484s Table 4.2 Standard git setup pushing test results for 20MB file size 48

For 50MB file size: Real User System First 4m 15.124s 0.911s 2.491s Second 2m 6.349s 1.642s 0.671s Third 3m 2.314s 2.045s 0.512s Fourth 2m 10.690s 1.971s 0.332s Fifth 2m 53.441s 1.998s 0.523s Average 173.6034s 1.7134s 0.9058s Table 4.3 Standard git setup pushing test results for 50MB file size For 100MB file size: Real User System First 5m 7.575s 3.091s 2.373s Second 7m 42.134s 3.834s 1.524s Third 8m 4.470s 4.242s 1.301s Fourth 7m 11.408s 4.039s 1.303s Fifth 9m 37.462s 4.040s 1.543s Average 452.6098s 3.8492s 1.6088s Table 4.4 Standard git setup pushing test results for 100MB file size 49

For 200MB file size: Real User System First 15m 45.380s 6.048s 5.552s Second 18m 26.109s 7.201s 3.771s Third 17m 38.950s 8.498s 2.809s Fourth 15m 9.881s 7.792s 2.897s Fifth 12m 7.615s 8.310s 2.374s Average 949.587s 7.5698s 3.4806s Table 4.5 Standard git setup pushing test results for 200MB file size 4.6.2 Standard Git Setup Pulling Performance Test Results For 10MB file size: Real User System First 12.307s 0.268s 0.215s Second 11.196s 0.257s 0.231s Third 11.851s 0.278s 0.260s Fourth 11.249s 0.301s 0.199s Fifth 12.839s 0.284s 0.283s Average 12.5936s 0.5398s 0.1638s Table 4.6 Standard git setup pulling test results for 10MB file size 50

For 20MB file size: Real User System First 21.870s 0.394s 0.243s Second 21.956s 0.398s 0.540s Third 21.242s 0.361s 0.543s Fourth 21.816s 0.417s 0.217s Fifth 21.990s 0.391s 0.272s Average 28.8046s 0.7264s 0.876s Table 4.7 Standard git setup pulling test results for 20MB file size For 50MB file size: Real User System First 49.088s 0.726s 0.427s Second 47.044s 0.833s 0.530s Third 48.293s 0.802s 0.302s Fourth 53.484s 0.774s 0.378s Fifth 48.113s 0.777s 0.368s Average 67.996s 2.624s 0.9012s Table 4.8 Standard git setup pulling test results for 50MB file size 51

For 100MB file size: Real User System First 1m 24.991s 1.433s 0.542s Second 1m 26.045s 1.374s 0.564s Third 1m 23.924s 1.372s 0.625s Fourth 1m 24.799s 1.319s 0.609s Fifth 1m 23.489s 1.506s 0.388s Average 120.4898s 4.593s 2.3624s Table 4.9 Standard git setup pulling test results for 100MB file size For 200MB file size: Real User System First 3m 28.949s 2.488s 4.299s Second 2m 53.545s 2.471s 0.804s Third 2m 51.158s 2.308s 0.861s Fourth 2m 56.156s 2.387s 4.043s Fifth 2m 55.154s 2.650s 2.834s Average 225.0294s 9.681s 4.0198s Table 4.10 Standard git setup pulling test results for 200MB file size 52

4.6.3 Git-annex Setup Pushing Performance Test Results For 10MB file size: Real User System First 12.595s 0.220s 0.165s Second 18.281s 0.266s 0.132s Third 13.597s 0.179s 0.196s Fourth 30.694s 0.229s 0.181s Fifth 12.937s 0.192s 0.200s Average 17.6208s 0.2172s 0.1748s Table 4.11 Git-annex setup pushing test results for 10MB file size For 20MB file size: Real User System First 41.889s 0.308s 0.210s Second 21.978s 0.241s 0.196s Third 28.492s 0.298s 0.171s Fourth 1m 38.098s 0.442s 0.166s Fifth 1m 27.727s 0.335s 0.237s Average 55.6368s 0.3248s 0.196s Table 4.12 Git-annex setup pushing test results for 20MB file size 53

` For 50MB file size: Real User System First 1m 1.784s 0.492s 0.211s Second 1m 21.326s 0.493s 0.258s Third 54.238s 0.420s 0.247s Fourth 1m 5.874s 0.476s 0.238s Fifth 1m 52.033s 0.640s 0.218s Average 75.051s 0.5042s 0.2344s Table 4.13 Git-annex setup pushing test results for 50MB file size For 100MB file size: Real User System First 2m 26.681s 0.832s 0.422s Second 2m 52.166s 0.868s 0.448s Third 3m 59.763s 0.962s 0.432s Fourth 4m 11.124s 0.988s 0.365s Fifth 4m 38.861s 0.911s 0.422s Average 217.719s 0.9122s 0.4178s Table 4.14 Git-annex setup pushing test results for 100MB file size 54

For 200MB file size: Real User System First 7m 16.839s 1.383s 2.813s Second 5m 57.097s 1.490 0.651s Third 6m 33.457s 1.481s 0.638s Fourth 7m 17.392s 1.517s 0.524s Fifth 5m 7.001s 1.399s 0.667s Average 386.3572s 1.454s 1.0586s Table 4.15 Git-annex setup pushing test results for 200MB file size 4.6.4 Git-annex Setup Pulling Performance Test Results For 10MB file size: Real User System First 12.307s 0.268s 0.215s Second 11.196s 0.257s 0.231s Third 11.851s 0.278s 0.260s Fourth 11.249s 0.301s 0.199s Fifth 12.839s 0.284s 0.283s Average 11.8884s 0.2776s 0.2376s Table 4.16 Git-annex setup pulling test results for 10MB file size 55

For 20MB file size: Real User System First 21.870s 0.394s 0.243s Second 21.956s 0.398s 0.540s Third 21.242s 0.361s 0.543s Fourth 21.816s 0.417s 0.217s Fifth 21.990s 0.391s 0.272s Average 21.7748s 0.3922s 0.363s Table 4.17 Git-annex setup pulling test results for 20MB file size ` For 50MB file size: Real User System First 49.088s 0.726s 0.427s Second 47.044s 0.833s 0.530s Third 48.293s 0.802s 0.302s Fourth 53.484s 0.774s 0.378s Fifth 48.113s 0.777s 0.368s Average 49.2044s 0.7824s 0.401s Table 4.18 Git-annex setup pulling test results for 50MB file size 56

For 100MB file size: Real User System First 1m 24.991s 1.433s 0.542s Second 1m 26.045s 1.374s 0.564s Third 1m 23.924s 1.372s 0.625s Fourth 1m 24.799s 1.319s 0.609s Fifth 1m 23.489s 1.506s 0.388s Average 84.6496s 1.4008s 0.5456s Table 4.19 Git-annex setup pulling test results for 100MB file size For 200MB file size: Real User System First 3m 28.949s 2.488s 4.299s Second 2m 53.545s 2.471s 0.804s Third 2m 51.158s 2.308s 0.861s Fourth 2m 56.156s 2.387s 4.043s Fifth 2m 55.154s 2.650s 2.834s Average 180.9924s 2.4608s 2.5682s Table 4.20 Git-annex setup pulling test results for 200MB file size 57

SIZE OF FILES 4.7 Graph Visualization Based On Results The averages of each test categories will be visualised in a graph form so the comparison of test performance between standard git setup and git-annex setup can be seen in a clearer form. In the graph comparison, git standard setup is directly being compared to git-annex setup to analyse the graph trend. 4.7.1 Average Pushing Real Time Test Graph The average pushing time of standard git is compared to git-annex setup. Average Pushing (Upload) Real Time Test 10MB 20MB 0.38 0.18 1.28 0.57 50MB 1.15 2.54 100MB 3.38 7.33 200MB 6.26 15.5 0 2 4 6 8 10 12 14 16 18 TIME TAKEN TO PUSH (IN MINUTES) Standard Git Git-annex Figure 4.34: Average Pushing Real Time Test Graph Figure 4.34 shows git-annex setup takes shorter time to complete the pushing test compared to standard git setup. 58

SIZE OF FILES 4.7.2 Average Pulling Real Time Test Graph The average pulling time of standard git is compared to git-annex setup. Average Pulling (Download) Real Time Test 10MB 20MB 50MB 0.13 0.11 0.29 0.22 0.68 0.49 100MB 1.25 2 200MB 3.1 3.45 0 0.5 1 1.5 2 2.5 3 3.5 4 TIME TAKEN TO PULL (IN MINUTES) Standard Git Git-annex Figure 4.35: Average Pulling Real Time Test Graph Figure 4.135 shows graph comparison where git-annex setup takes shorter time to complete the pulling test compared to standard git setup. 59

SIZE OF FILES 4.7.3 Average Pushing CPU Time Test Graph For the CPU time test, the value of user and sys is summed up to make a total of actual CPU time. For the detail explanation, please refer to 4.5.2.3.3 Linux Command. Average Pushing (Upload) CPU Time Test 10MB 20MB 0.57 0.39 1.09 0.52 50MB 0.74 2.62 100MB 1.33 5.46 200MB 2.51 11.05 0 2 4 6 8 10 12 TIME TAKEN TO PUSH (IN SECONDS) Standard Git Git-annex Figure 4.36: Average Pushing CPU Time Test Graph Figure 4.36 shows graph comparison where git-annex setup takes shorter time to complete the pushing test compared to standard git setup. 60

SIZE OF FILES 4.7.3 Average Pulling CPU Time Test Graph For the CPU time test, the value of user and sys is summed up to make a total of actual CPU time. For the detail explanation, please refer to 4.5.2.3.3 Linux Command Average Pulling (Download) CPU Time Test 10MB 20MB 50MB 0.7 0.52 1.6 0.76 1.18 3.53 100MB 1.95 6.96 200MB 5.03 13.7 0 2 4 6 8 10 12 14 16 TIME TAKEN TO PULL (IN SECONDS) Standard Git Git-annex Figure 4.37: Average Pulling CPU Time Test Graph Figure 4.37 shows graph comparison where git-annex setup takes shorter time to complete the pulling test compared to standard git setup. 61

4.8 Chapter Summary This chapter discusses on how the installation of git, basic command of the git, the performance test workflow and also the results of all the tests that have been conducted. This chapter breakdown in depth all the direction of the project especially in the performance test part. At the end of this chapter, the data for all of the tests are extracted and have been visualized as a bar graph to look at how git-annex can help the git ecosystem to perform better compared to standard git setup. 62

CHAPTER 5 CONCLUSION 5.1 Introduction This chapter discusses the project finding and results analysis from the test that have been conducted, project constraints and limitations, future work and recommendation, and also the summary of Chapter 5. 5.2 Project Findings and Results Analysis In doing this project, there is a lot of new knowledge and finding about git. Some of them is git-annex is a compulsory installation to both sides; the server and the local machine. Installation with only one party of the environment, either the server or the local will not letting user to use the git-annex command like an example the sync command. This command is pretty useful and have big advantage over standard git setup in terms of pushing and pulling files. Other includes that git-annex drastically improves performance not only the server, but to the local itself. The way git-annex consistently take less time to complete the test proving that git-annex is the answer of improving performance of the git 63

environment. The impact of git-annex implementation can be seen in a bigger files or repository, meaning that small size of files or repository have small negligible improvement. This is proven by test from the 10MB file size which only speedup the time in small amount of time. Logically, most of the projects developed by a team of developer are a big one and worth hundreds gigabyte of size. By implementing gitannex, not only the git environment can be improved, the developers also can increase more productivity to work on new code since less time spend in pushing and pulling from the server. Git-annex also reduce the workload of CPU processing. Looking at the graph result, we can clearly see CPU spend less time in executing command when git-annex is implemented. There are many advantages in this matter. Firstly, if only less CPU time spend for the processing, the probabilities of the CPU to go very hot and throttle is very unlikely. This leads to a better maintenance of hardware that hosts the CPU. A throttle CPU will only pissing the users off since the CPU will have to lower its clock speed so that the temperature will cool down. This will make the CPU have to work in a slower than it should be. When doing this project, git-annex can be installed on Windows operating system too, not limited to Linux operating system. But, working the git repository on Windows is harder and users are more viable to face errors. It is recommended to installed git-annex on Linux machine compared to Windows. 64

Finally, even though the same file size is used to push and pull across the test, pulling test always have a quicker time to be completed compared to pushing test. The big factor contributing to this anomaly is that there is a vast different of write and speed between the server and local. In the server, when the file is being pushed, the speed of writing the file is based on the USB drive on the Raspberry Pi. In contrary, when the file is being pulled, the file is being written to an SSD of the local machine. Since we are comparing the speed of standard git setup versus git-annex setup; not a pushing versus pulling test, this information is less relevant for the overall project. It is worth noted since the time taken for both pushing and pulling test is biased towards pulling test. 5.3 Project Constraints and Limitation One of the biggest constraint in developing this project is there is only one Raspberry Pi for the project. By having only one Raspberry Pi, the project have to conducted in slower way since there is not enough room for two server at one time. The files and data of the standard git server have to be deleted to make way for git-annex server. Another constraint is slow Wi-Fi and internet speed when conducting this project. The relevancy of fast internet speed is the project can be tested with a bigger file size. A big file size like 1GB makes more sense to replicate the size of real life projects in a git environment. However, since the impact of implementation git-annex can be felt as little as 10MB file size, the project does not need file size more than 200MB to prove its performance capabilities. 65

5.4 Future Work and Recommendation This project has more room to be improved and more potential to be polished for future work and research. Firstly, a working git server should have good specification from the hardware perspective. Even though this project only uses a Raspberry Pi to host the git server and working just fine, it still have problems. This includes the storage speed issue. Raspberry Pi is fine when is being used for personal use and also to simulate projects at a smaller scale. A git server needs proper physical host to store all the data and also to make sure the temperature stays in control every time. Next, simulation of this test can be done using a real Linux machine. This project only uses a virtual machine to mimic the git repository in a local machine. The disadvantages of using virtual machine includes less accurate of measuring time taken during the test and also slower reaction of time when using the virtual machine. As virtual machine needs hypervisor that acts as a bridge between the operating system of the physical hardware and the operating system of the virtual machine, the CPU has to consider the hypervisor every time making the operating system in the virtual machine can be felt a little slower. Another recommendation is to monitor properly the temperature of the git server s physical hardware. One of the thing that plays a big part in the test results is the temperature of the USB drive in the Raspberry Pi. The time taken to complete the test is noticeably longer when the USB drive is a little hotter than usual. This can be felt while doing the pushing test since we are writing the files onto the USB drive. 66

5.5 Chapter Summary Git is a very important platform for developers as it can manage their works easily and systematically. A slow git repository will only resulting in less developer s productivity since the developers will take more time waiting for files transfer across the server and their local. So, it is very important for the git environment to have an optimized workflow and speedy performance. This project is proposed to simulate and implemented git-annex to address the problems of slow performance of a git server. This project also provides a full tutorial for the users that want to implement the gitannex features in their git repository. Hopefully, this project will help people solving their slow git environment problems. This project has been a wonderful experience and I have been enjoying every little time in conducting it. In conclusion, git-annex is clearly a simple powerful tools that helps to improve performance not the git server, but to the whole git ecosystems in general. 67

REFERENCES 1. Toni Vaakanainen. (2016). Preparation of Services to Essential Web Development. 2. Riku Ojala. (2016). Version Control System, Designing and Implementing Server Infrastructure. 3. Aishwarya Nair, Meenakshi Dhanani, Rupesh Gangwani, Prof. Hema Gaikwad. Comparison of Software Configuration Management Tools. 4. Eric Sink. (2011). Version Control by Example. 5. Vlad Korolev, Anupam Joshi. (2014). PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments. 6. What is Version Control [Online]. Available: https://www.atlassian.com/git/tutorials/what-is-version-control 7. Git-annex [Online]. Available: https://en.wikipedia.org/wiki/git-annex 8. Git-annex Walkthrough [Online]. Available : https://git-annex.branchable.com/walkthrough/#index3h2 9. Git Commands Tutorial [Online]. Available: https://www.siteground.com/tutorials/git/commands/ 10. Lee Hinman. (2016). Getting Started with Git-annex. 11. Git-annex On Your Own Server [Online]. Available: https://gitannex.branchable.com/tips/centralized_git_repository_tutorial/o n_your_own_server/ 68

12. Git-annex How it Works [Online]. Available: https://git-annex.branchable.com/how_it_works/ 13. Scott Killdall. (2015) Gitpi: A Private Git Server on Raspberry Pi. [Online]. Available: https://www.instructables.com/id/gitpi-a-private-git-server-on- Raspberry-Pi/ 14. Thomas Loughlin. (2012). GitPi: Using your Raspberry Pi as a Dedicated Git Repository [Online]. Available: http://thomasloughlin.com/gitpi-using-your-raspberry-pi-as-a-dedicatedgit-repository/ 15. Perforce.com. (2017). Storing Large Binary Files in Git Repositories [Online]. Available: https://www.perforce.com/blog/storing-large-binary-files-in-gitrepositories 16. Vitaly Emporopulo. (2017). Git Performance Benchmark. [Online]. Available: https://open-amdocs.github.io/git-performance-benchmark 17. Harlo Holm. (2014). So, I'm Excited about Git-Annex. [Online]. Available: http://harloholm.es/2014/04/16/so_im_excited_about_gitannex.html 69

APPENDICES 70