BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI July, 2006

Similar documents
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI July, 2006

McGill University Virtualization Service Description and Service Level Agreement

Outline. March 5, 2012 CIRMMT - McGill University 2

Cornell Theory Center 1

Cluster Network Products

OBTAINING AN ACCOUNT:

How HP delivered a 3TB/hour Oracle TM backup & 1TB/hour restore. Andy Buckley Technical Advocate HP Network Storage Solutions

Alabama Supercomputer Center Alabama Research and Education Network

Bright Cluster Manager

Users and utilization of CERIT-SC infrastructure

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

IBM System p5 550 and 550Q Express servers

WVU RESEARCH COMPUTING INTRODUCTION. Introduction to WVU s Research Computing Services

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Introduction. Kevin Miles. Paul Henderson. Rick Stillings. Essex Scales. Director of Research Support. Systems Engineer.

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

SPINOSO Vincenzo. Optimization of the job submission and data access in a LHC Tier2

Name Department/Research Area Have you used the Linux command line?

SuperMike-II Launch Workshop. System Overview and Allocations

How to Use a Supercomputer - A Boot Camp

Report Prepared By: Prof. Aruna Pavate

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

НИИ ТОЧНЫХ ПРИБОРОВ Dr. Adrov Victor Dr. Shirshov Sergey Potapov Sergey

Technology Insight Series

Research Collection. WebParFE A web interface for the high performance parallel finite element solver ParFE. Report. ETH Library

Grid Computing Competence Center Large Scale Computing Infrastructures (MINF 4526 HS2011)

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Knights Landing production environment on MARCONI

Is it a good idea to create an HSM without tapes?

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

OPERATING SYSTEM. Functions of Operating System:

Introduction to Grid Computing

Introduction to GALILEO

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Implementing Problem Resolution Models in Remedy

XenData Product Brief: SX-550 Series Servers for LTO Archives

HPC Growing Pains. IT Lessons Learned from the Biomedical Data Deluge

When user select menu 2. Tables from the Main menu, the following screen will appear:

Choosing Resources Wisely. What is Research Computing?

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Guillimin HPC Users Meeting. Bryan Caron

MOHA: Many-Task Computing Framework on Hadoop

Introduction to High-Performance Computing (HPC)

Introduction to BioHPC

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

JOB SUBMISSION ON GRID

Introduction to BioHPC New User Training

Protect enterprise data, achieve long-term data retention

Brown County Virtualization Project

StorNext SAN on an X1

Shared Object-Based Storage and the HPC Data Center

ShuttleService. Scalable Big Data Processing Utilizing Cloud Structures. A Tick Data Custom Data Solutions Group Case Study

LAMBDA The LSDF Execution Framework for Data Intensive Applications

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

University at Buffalo Center for Computational Research

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U

Overcoming Obstacles to Petabyte Archives

Backup Exec Subscription Licensing Guide

Milestone Solution Partner IT Infrastructure Components Certification Report

Introduction to High-Performance Computing (HPC)

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

GLOBAL TENDER NOTICE NO.: 05/ Last date of receipt of the sealed quotations: Upto 3 P.M. of

Linux Essentials. Smith, Roderick W. Table of Contents ISBN-13: Introduction xvii. Chapter 1 Selecting an Operating System 1

The RAMDISK Storage Accelerator

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

Ohio Supercomputer Center

Outline. INF3190:Distributed Systems - Examples. Last week: Definitions Transparencies Challenges&pitfalls Architecturalstyles

Data storage services at KEK/CRC -- status and plan

Backup Exec 20.1 Licensing Guide

Benoit DELAUNAY Benoit DELAUNAY 1

CMS Grid Computing at TAMU Performance, Monitoring and Current Status of the Brazos Cluster

IBM Storwize V7000 Unified

Flux: The State of the Cluster

IBM EXAM QUESTIONS & ANSWERS

Introduction to PICO Parallel & Production Enviroment

Operating Systems. studykorner.org

Chapter 7. GridStor Technology. Adding Data Paths. Data Paths for Global Deduplication. Data Path Properties

LBRN - HPC systems : CCT, LSU

NBIC TechTrack PBS Tutorial

Windows Compute Cluster Server 2003 allows MATLAB users to quickly and easily get up and running with distributed computing tools.

MASTER OF COMPUTER APPLICATIONS (MCA)

KillTest. 半年免费更新服务

Performance of relational database management

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

InfoSphere Warehouse with Power Systems and EMC CLARiiON Storage: Reference Architecture Summary

2017 Resource Allocations Competition Results

The Why and How of HPC-Cloud Hybrids with OpenStack

GFS: The Google File System. Dr. Yingwu Zhu

UB CCR's Industry Cluster

Day 9: Introduction to CHTC

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

High-Performance Computing at The University of Michigan College of Engineering

Title: CAPACITY PLANNING - SERVER/HOST INFORMATION FORM

Archives Storage Solutions Technical Final Proposal

FSRM (File Server Resource Management)

High Performance Computing Advisory Group May 23, 2016

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Backup Exec 20.3 Licensing Guide

Transcription:

BACKUP PLANNING AND IMPLEMENTATION FOR ANUPAM SUPERCOMPUTER USING ROBOTIC AUTOLOADERS BY Aalap Tripathy 2004P3PS208 B.E. (Hons) Electrical & Electronics Prepared in partial fulfillment of the Practice School I Course AT Supercomputing Research Facility, Computer Division Bhabha Atomic Research Centre, Trombay A Practice School I Station of BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE, PILANI July, 2006

ACKNOWLEDGEMENT It s a moment of intense pleasure to express my gratitude towards everyone who directly or indirectly helped me during this project work. I am thankful to Sh. A G Apte, Head, Computer Division, for providing me an opportunity to work on this project. My sincere gratitude to Mr. K Rajesh, SO(F), SRF, Computer Division to have assigned & assisted me in the course of his project. Despite his busy schedule, he always found time out for his sharp review of the development I made during the day. It is truly an honour for me to have been associated with such a co-operative and extraordinarily brilliant scientist. But, I am absolutely sure that this project work couldn t be better as it is now without timely trouble shooting by Mr. Kislay Bhatt, SO(E), SRF, Computer Division who has rescued me from numerous weird error messages. Thanks to Mr. Phool Chand, SO(E), Computer Division for having helped in design of the Backup Management Suite especially in critical PHP code segments. I must specially thank Mr Rohitashva Sharma, SO(E), Computer Division for his timely help and support in Mr Phool Chand s absence. Thank you, Sirs. I must mention here that none of this would have been possible but for the confidence of Dr. Himanshu Agarwal, PS-faculty-in-charge in my abilities. He made it a point to have me assigned to this Division of my choice for my project. Aalap Tripathy aalap@bits-goa.ac.in 2

BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE PILANI (RAJASTHAN) Practice School Division Station : Computer Division, BARC Centre : Mumbai, Maharashtra Title of the Project : BACKUP PLANNING AND IMPLEMENTATION FOR ANUPAM SUPERCOMPUTER USING ROBOTIC AUTOLOADERS Duration : 24 th May, 2006 15 th July, 2006 Date of Start : 2 nd, June, 2006 Date of Submission : 13 th, July, 2006 Prepared By : ID Number Name Discipline 2004P3PS208 Aalap Tripathy B.E. Hons. Electrical & Electronics Experts Involved : Name Designation Division Mr K Rajesh Scientific Officer (F) Supercomputing Research Facility, Computer Division Mr Kislay Bhatt Scientific Officer (E) Supercomputing Research Facility, Computer Division Mr Phool Chand Scientific Officer (E) Supercomputing Research Facility, Computer Division Name of the PS Faculty : Dr. Himanshu Agarwal, Asst. Professor, Mechanical Engineering Group Key Words : Automation, Backup, Supercomputers, High Performance Clusters, Autoloaders, Tape Drives, High Capacity Tapes, Backup management, Shell Scripting Project Areas : Backup Automation for user data to the tune of 3.2 Terabytes. Customized scripts to simplify administration of the Autoloader by robotics control commands Abstract : This project aims to automate backup from 12 fileservers to tapes mounted on Tandberg Tape autoloader - LTO Ultrium ( Ultrium 2 ) by the use of shell scripts. A complete package including web interface has been developed and customized for use for ANUPAM supercomputer at BARC BMS (Backup Management Suite) Signature of Student Date Signature of PS Faculty Date 3

CONTENTS Sl. No. Title Page Numbers 1 Introduction 2 5 2 Shell Scripting, PHP, Javascript 6 3 Commercial/Open Source Implementations 7 4 Resources Used 8 5 Overview 9 6 Implementation Details 10 7 Code Functionality 11-14 8 Code Explanation 15 23 9 Future Enhancements 24 10 References 25 Appendix A : Technical Specifications ANUPAM Supercomputer Appendix B : Technical data - Tandberg SuperLoader LTO2 Appendix C : Technical data Backup Server Appendix D : Technical data File Server Appendix E : Study of User Space Utilisation (21st June, 2006) Appendix F : Code View of the scripts written Appendix G : Code View of the PHP Scripts & HTML Pages 4

1. Introduction 1.1 Objective The major driving force behind this work was to automate the back up procedure for the fileservers (sagar01-sagar12) from a full backup taken on the 1 st, 8 th, 15 th, 23 rd days of a month to a combination of full and incremental types. Failsafe procedures were to be developed with appropriate checks for extraordinary circumstances like power failure / hardware error. Logging was to be improved. Index files of the files being created were generated which were searchable. The aim is to provide a complete backup management suite similar to what major commercial/open source packages provide. Some additional features like holding disks etc were also to be implemented. 1.2 System Analysis and Definition ANUPAM is a series of parallel computers developed at Computer Division, BARC to cater to the need of high computing power for various high performance applications. This system is used by BARC scientists in a wide range of fields like Reactor Physics, Reactor Engineering, Crystallography, Molecular Dynamics and Computational Fluid Dynamics etc. The cluster is spread across 14 racks. o 10 racks(racks 1-5, 10-14) house the compute nodes o 2 racks (racks 6,9)house the storage and backup servers o 2 racks (racks 7,8) house the service nodes and networking equipment. o Racks numbered N1 to N4 are campus networking racks Fig : Plan Layout of Ameya Supercomputer Fig : Perspective View of Ameya Supercomputer 5

Fig : Perspective View of Aruna Cluster 1.2.1 System/Network Diagram of AMEYA Fig : Perspective View of Ashva Cluster All of the user s files are stored onto partitions located on any one of the 12 Storage Servers. These are accessible from the backup servers (kosh01 & kosh02) which are themselves interfaced to tape backup devices which in turn perform the backup operation as explained further. All of the development and control is done from the User Terminals shown above. 6

1.2.2 Submission of jobs A job is typically a shell script, FORTRAN programme, C language program etc with a set of attributes which provide resource and control information about the job. A batch means that the job will be scheduled for execution at a time chosen by the subsystem (defined below) according to a defined policy and the availability of resources. Portable Batch System (PBS) is a networked subsystem for submitting, monitoring and controlling a workload of batch jobs on one or more systems. In PBS, a queue is just a collection point for jobs and does not imply execution ordering. This is done by a scheduling policy decided by the system administrator. The above details are only for academic interest and are not required for an understanding of the backup mechanism described later. 1.3 Analysis of Organizational Data For the execution of a job, users of ANUPAM have to submit valid program(s) with associated data set onto their user space. The results of their computation are also stored onto it. There is no restriction on the number/size of jobs which can be submitted to the system. System Administrators are also flexible with respect to the amount of space allocated to a user i.e. it is often changed on request. However the size of filesystems is fixed and their entire content is to be backed up to the tape drives. User Data for Anupam (Ashva and Ameya) consists of programs, scripts applications. These are present on the Fileservers which are mounted by an auto mount script by the Network File System onto mount points accessible as /home/usr010, /home/usr011, /home/usr020, /home/usr021, /home/usr030, /home/usr040, /home/usr050, /home/usr060, /home/usr070, /home/usr080, /home/usr090, /home/usr110, /home/usr120 Besides there are other partitions which are not backed up. They are in single digits viz /home/usr7. Each of the above mentioned partitions contain home directories of various users. Each one of the above mentioned partitions are of size 200 GB each. A study of each partition as detailed in Appendix C shown showed that in most cases the entire space is not being used currently. 1.4 Choosing the right backup type requirements v/s issues involved Backup Scenarios Server Backup Large Network Backup Personal Backup 7

The system at ANUPAM corresponds to a Large Network Type where the file systems are connected by Fast Ethernet Switches ( 1 Gbps) to each other and the controller. When to backup o It is a resource hungry process o It stresses both the systems and the network o Full backup takes 4 hours on an average o It should therefore be done when the system is not in active use o Full Backup is may be done on weekends where the backup window is 24-36 hours o Incremental Backup to be done on weekends after Office hours with a backup window of 3-12 hours Volume Configuration o The data on each file server is in terabytes, so high efficiency is to be achieved by efficiently transferring only essential data minimum number of times. o The life of the tapes is also an issue. Supplied tapes have a proclaimed efficiency of 2000 rewrite cycles. Backup Policy o The backup policy to be such that, system administrators can assure users that their files will be safe upto the maximum possible extent. o Also it should be able to restore a version of a file that the user requests say the version on <date>. o All possible user files should be backed up onto tapes with minimum effort. 1.5 Possibility of growth There are currently 12 fileservers. Each of them have averaged 6-200 GB hard disks. This computes to 1200 12 = 14400 GB approx 14 Terrabytes. Further, there is a proposal to increase the current configuration from 256 nodes (256 2 = 512 CPUs) to an aggregate 1000 CPUs. There is also a proposal to migrate to higher capacity tape drives with autoloader capacity. The current system is to be built to survive this expansion successfully. Also Autoloader s with 100 tapes is proposed to be bought. The backup scripts should be able to run on the advanced autoloaders easily. 1.6 Tape Loader Interfacing The Tandberg Autoloader was equipped with 16 removable tapes each having a recording capacity of 200 GB. This adds up to available storing capacity of 3200 GB i.e. 3.2 TB This Autoloader was interfaced to a server running Scientific Linux. This server was kosh02. 8