Introduction to GACRC Storage Environment. Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer

Similar documents
Introduction to GACRC Storage Environment. Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC

Introduction to High Performance Computing (HPC) Resources at GACRC

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using the New Cluster at GACRC

Introduction to HPC Using the New Cluster (Sapelo) at GACRC

Introduction to HPC Using zcluster at GACRC

Introduction to High Performance Computing (HPC) Resources at GACRC

Introduction to HPC Using the New Cluster (Sapelo) at GACRC

High Performance Computing (HPC) Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

Introduction to HPC Using the New Cluster at GACRC

Introduction to HPC Using the New Cluster at GACRC

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using Sapelo Cluster at GACRC

Migrating from Zcluster to Sapelo

Python on GACRC Computing Resources

Introduction to High Performance Computing Using Sapelo2 at GACRC

Introduction to Linux Basics

Using Sapelo2 Cluster at the GACRC

High Performance Compu2ng Using Sapelo Cluster

Introduction to HPC Using the New Cluster at GACRC

Introduction to HPC Using Sapelo Cluster at GACRC

Introduction to HPC Using Sapelo at GACRC

GACRC User Training: Migrating from Zcluster to Sapelo

Using the computational resources at the GACRC

Introduction to HPC Using Sapelo Cluster at GACRC

Introduction to HPC Using Sapelo Cluster at GACRC

Linux Training. for New Users of Cluster. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala

UoW HPC Quick Start. Information Technology Services University of Wollongong. ( Last updated on October 10, 2011)

Introduction to GACRC Teaching Cluster

Introduction to GACRC Teaching Cluster PHYS8602

Introduction to GACRC Teaching Cluster

High Performance Computing Cluster Basic course

For Dr Landau s PHYS8602 course

Introduction to UNIX command-line

Arkansas High Performance Computing Center at the University of Arkansas

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

New User Tutorial. OSU High Performance Computing Center

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Introduction to High-Performance Computing (HPC)

Python Language Basics II. Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Introduction to the Linux Command Line

Unix/Linux Operating System. Introduction to Computational Statistics STAT 598G, Fall 2011

Introduction to High-Performance Computing (HPC)

Data Management at ARSC

How to Use a Supercomputer - A Boot Camp

BIOINFORMATICS POST-DIPLOMA PROGRAM SUBJECT OUTLINE Subject Title: OPERATING SYSTEMS AND PROJECT MANAGEMENT Subject Code: BIF713 Subject Description:

OBTAINING AN ACCOUNT:

Research. We make it happen. Unix Basics. User Support Group help-line: personal:

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

ICS-ACI System Basics

Introduction to Linux Basics Part I. Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala

Introduction to HPC Resources and Linux

Linux for Beginners Part 1

Storage Discussion. GACRC Advisory Committee Meeting February 24, 2015

An Introduction to Cluster Computing Using Newton

Computing with the Moore Cluster

Computer Systems and Architecture

Introduction to Discovery.

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

HPC Introductory Course - Exercises

Introduction to Unix. University of Massachusetts Medical School. October, 2014

Computer Systems and Architecture

Linux command line basics II: downloading data and controlling files. Yanbin Yin

Linux 系统介绍 (I) 袁华

Embedded Linux Systems. Bin Li Assistant Professor Dept. of Electrical, Computer and Biomedical Engineering University of Rhode Island

Some useful UNIX Commands written down by Razor for newbies to get a start in UNIX

Ftp Command Line Commands Linux Example Windows Put

Lab 1 Introduction to UNIX and C

Practical Knowledge Transfering, moving and exporting files Martin Dahlö

The JANUS Computing Environment

Lisa User Day Efficient use of file systems. John Donners

HPC at W&M / VIMS. June 17, 2015 Eric J. Walter

Using Linux as a Virtual Machine

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Working with Basic Linux. Daniel Balagué

CSC BioWeek 2018: Using Taito cluster for high throughput data analysis

EE516: Embedded Software Project 1. Setting Up Environment for Projects

CSC BioWeek 2016: Using Taito cluster for high throughput data analysis

Introduction to UNIX command-line II

GNU/Linux 101. Casey McLaughlin. Research Computing Center Spring Workshop Series 2018

Cross-compilation with Buildroot

Graham vs legacy systems

Linux for Biologists Part 2

Unix Essentials. BaRC Hot Topics Bioinformatics and Research Computing Whitehead Institute October 12 th

SIEMENS UserAdmin Workshop TELEPERM XP Version 4 Chapter 1

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Welcome to getting started with Ubuntu Server. This System Administrator Manual. guide to be simple to follow, with step by step instructions

Linux Tutorial - ScINET. August 29, 2017 Jonathan Shao, Computational Biologist, NEA

Knights Landing production environment on MARCONI

Linux Operating System Environment Computadors Grau en Ciència i Enginyeria de Dades Q2

Introduction to Unix The Windows User perspective. Wes Frisby Kyle Horne Todd Johansen

Getting started with the CEES Grid

Introduction to Discovery.

An Introduction to Gauss. Paul D. Baines University of California, Davis November 20 th 2012

Unix background. COMP9021, Session 2, Using the Terminal application, open an x-term window. You type your commands in an x-term window.

Transcription:

Introduction to GACRC Storage Environment Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu

Outline What is GACRC? Overview of Linux Commands GACRC Storage Environment Data Transferring Snapshot and Backup Best Practice Suggestions from GACRC

What is GACRC? Who Are We? Georgia Advanced Computing Resource Center Collaboration between the Office of Vice President for Research (OVPR) and the Office of the Vice President for Information Technology (OVPIT) Guided by a faculty advisory committee (GACRC-AC) Why Are We Here? To provide computing hardware and network infrastructure in support of highperformance computing (HPC) at UGA Where Are We? http://gacrc.uga.edu (Web) http://gacrc.uga.edu/help/ (Web Help) https://wiki.gacrc.uga.edu/wiki/getting_help (Wiki Help) http://wiki.gacrc.uga.edu (Wiki)

GACRC Users September 2015 Colleges & Schools Depts PIs Users Franklin College of Arts and Sciences 14 117 661 College of Agricultural & Environmental Sciences 9 29 128 College of Engineering 1 12 33 School of Forestry & Natural Resources 1 12 31 College of Veterinary Medicine 4 12 29 College of Public Health 2 8 28 College of Education 2 5 20 Terry College of Business 3 5 10 School of Ecology 1 8 22 School of Public and International Affairs 1 3 3 College of Pharmacy 2 3 5 40 214 970 Centers & Institutes 9 19 59 TOTALS: 49 233 1029

GACRC Users September 2015 Centers & Institutes PIs Users Center for Applied Isotope Study 1 1 Center for Computational Quantum Chemistry 3 10 Complex Carbohydrate Research Center 6 28 Georgia Genomics Facility 1 5 Institute of Bioinformatics 1 1 Savannah River Ecology Laboratory 3 9 Skidaway Institute of Oceanography 2 2 Center for Family Research 1 1 Carl Vinson Institute of Government 1 2 19 59

Overview of Linux Commands Folder Navigating File Copying and Moving File Compression and Packaging Disk Storage and Filesystem

Overview of Useful Linux Commands Folder Navigating pwd: Print the absolute path of your current directory : pwd cd: Change current directory : cd..,cd /,cd /home/yourhome File Copying and Moving cp: Copy files : cp file1 file2, cp file1./subdir mv: Rename or move files : mv file1 file2, mv file1 file2./subdir

Overview of Useful Linux Commands File Compression and Packaging gzip: Compress files with GNU Zip gzip file Compress file to create file.gz. Original file is deleted gunzip: Uncompress GNU Zip files gunzip file.gz Uncompress file.gz to create file. Original file.gz is deleted.

Overview of Useful Linux Commands File Compression and Packaging tar: Pack multiple files and directories into a single file for transport, optionally compressed tar cvf myarchive.tar./mydir tar tvf myarchive.tar tar xvf myarchive.tar Create package List contents Extract package tar czvf myarchive.tar.gz./mydir tar tzvf myarchive.tar.gz tar xzvf myarchive.tar.gz Create & Compress List contents Uncompress & Extract

Overview of Useful Linux Commands Disk Storage and Filesystem ls: List the contents (files and subdirectories) of a directory ls l Long listing including file attributes ls h Print file sizes in KB, MB, and GB, instead of bytes ls a List all files, including hidden files whose names begin with a dot du: Measure the disk space occupied by files and directories du h Measure the size of current directory and all its subdirectories du h file1 file2 Measure the size of two files

Overview of Useful Linux Commands Disk Storage and Filesystem df: Report on all mounted filesystems with the size, used space, and free space df h Print human-readable output, and choose the most appropriate unit for each size Filesystem Size Used Avail Use% Mounted on /dev/mapper/volgroup01-logvol_root 99G 14G 84G 15% / devtmpfs 16G 0 16G 0% /dev tmpfs 16G 2.4M 16G 1% /run /dev/sda1 486M 59M 402M 13% /boot /dev/mapper/volgroup01-logvol_home 493G 86G 406G 18% /home

GACRC Storage Environment zcluster Storage Environment Sapelo Storage Environment GACRC Storage Environment

zcluster Storage Environment

zcluster Storage Environment Filesystem Role Quota Accessible from Intended Use Notes /home/abclab/username /escratch4/username Home Scratch 100GB 4TB zcluster.rcc.uga.edu (Login) Interactive nodes (Interactive) copy.rcc.uga.edu (Copy) xfer2.gacrc.uga.edu (Transfer) compute nodes (Compute) Highly static data being used frequently Temporarily storing large data being used by jobs Snapshots Auto-deleted in 37 days /lscratch/username Local Scratch 18 ~ 370GB /project/abclab Storage Variable Note: Individual compute node Jobs with heavy disk I/O User to clean up copy.rcc.uga.edu (Copy) xfer2.gacrc.uga.edu (Transfer) 1. /usr/local : Software installation directory /db : bioinformatics database installation directory 2. To login to Interactive nodes, use qlogin from Login node Long-term data storage Group sharing possible

zcluster Storage Environment 6 Main Function On/From-Node Related Filesystem Login Landing Login or Copy /home/abclab/username (Home) (Always!) Batch Job Submitting Interactive Job Running Data Archiving, Compressing and Transferring Job Data Temporarily Storing Login or Interactive Interactive Copy or Transfer Compute /escratch4/username (Scratch) (Suggested!) /home/abclab/username (Home) /escratch4/username (Scratch) /home/abclab/username (Home) /escratch4/username (Scratch) /home/abclab/username (Home) /lscratch/username (Local Scratch) /escratch4/username (Scratch) Long-term Data Storing Copy or Transfer /project/abclab

Sapelo Storage Environment

Sapelo Storage Environment Filesystem Role Quota Accessible from Intended Use Notes /home/username Home 100GB /lustre1/username Scratch No Limit /lscratch/username Local Scratch sapelo1.gacrc.uga.edu (Login) Interactive nodes (Interactive) xfer2.gacrc.uga.edu (Transfer) build1.gacrc.uga.edu (Build) compute nodes (Compute) Interactive nodes (Interactive) xfer2.gacrc.uga.edu (Transfer) compute nodes (Compute) Highly static data being used frequently Temporarily storing large data being used by jobs Snapshots Auto-moved to /project if 30 days no modification 250GB Individual compute node Jobs with heavy disk I/O User to clean up /project/abclab Storage Variable xfer2.gacrc.uga.edu (Transfer) Long-term data storage Note: 1. /usr/local/apps : Software installation directory /db : bioinformatics database installation directory 2. To login to Interactive nodes, use qlogin from Login node Group sharing possible

Sapelo Storage Environment 7 Main Functions On/From-Node Related Filesystem Login Landing Login or Transfer or Build /home/username (Home) (Always!) Batch Job Submitting Interactive Job Running Data Archiving, Compressing and Transferring Job Data Temporarily Storing Login Interactive Interactive Transfer Compute /home/username (Home) /lustre1/username (Scratch) (Suggested!) /home/username (Home) /lustre1/username (Scratch) /home/username (Home) /lustre1/username (Scratch) /home/username (Home) Long-term Data Storing Copy or Transfer /project/abclab /lscratch/username (Local Scratch) /lustre1/username (Scratch) Code Compilation Build /home/username (Home)

GACRC Storage Environment Next Page

zcluster zone Transfer node Sapelo zone

GACRC Storage Environment What you should know about xfer2 (xfer2.gacrc.uga.edu): Transfer node b/w zcluster and Sapelo + Copy node of Sapelo Home directory on xfer2 = Home directory on Login of Sapelo : /home/username File systems on xfer2: /home/username : Sapelo home /panfs/pstor.storage/home/abclab/username : zcluster home /lustre1/username : Sapelo scratch /escratch4/username : zcluster scratch /project/abclab : long-term archival storage Most file systems on xfer2 are auto-mounted upon the first time full-path access, e.g., cd /lustre1/username. The command ls and TAB auto-completion may not work if the file system has not been mounted.

GACRC Storage Environment What you should know about Copy (copy.zcluster.rcc.uga.edu): Copy node of zcluster Home directory on Copy = Home directory on Login of zcluster : /home/abclab/username File systems on Copy: /home/abclab/username : zcluster home /escratch4/username : zcluster scratch /project/abclab : long-term archival storage /project file system on Copy is auto-mounted upon the first time full-path access, e.g., cd /project/abclab/username. The command ls and TAB auto-completion may not work if the file system has not been mounted.

Data Transferring b/w two filesystems on zcluster b/w two filesystems on Sapelo b/w local and GACRC Storage b/w GACRC zcluster and Sapelo b/w Internet and GACRC Storage Refer to https://wiki.gacrc.uga.edu/wiki/transferring_files

Data Transferring b/w two filesystems on zcluster Transfer interactively: Login to Copy Use cd to change directory Use cp or mv to copy or move data Transfer by copy queue: Create copying job submission script: copy.sh, e.g.: #!/bin/bash cd ${HOME} cp r datadir /project/abclab/username Submit to copyq: qsub q copyq copy.sh

Data Transferring b/w two filesystems on Sapelo /lustre1 scratch is visible on xfer2 or Interactive, NOT on Login! Transfer interactively on xfer2: Login to xfer2 Use cd to change directory Use cp or mv to copy or move data

Data Transferring b/w local and GACRC Storage zcluster users: Use Copy Linux/Mac OS X machine: scp, sftp, or FileZilla Windows machine: SSH file Transfer, FileZilla, or WinSCP Sapelo users: Use xfer2 Linux/Mac OS X machine: scp, sftp, or FileZilla Windows machine: SSH file Transfer, FileZilla, or WinSCP

Data Transferring b/w GACRC zcluster and Sapelo All users having zcluster and Sapelo accounts: Login to xfer2 Filesystems on xfer2: /home/username /panfs/pstor.storage/home/abclab/username /lustre1/username /escratch4/username /project/abclab Use cd to change directory Use cp or mv to copy or move data : Sapelo home : zcluster home : Sapelo scratch : zcluster scratch : long-term archival storage

Data Transferring b/w Internet and GACRC Storage zcluster users: Login to Copy (copy.rcc.uga.edu) Sapelo users: Login to xfer2 (xfer2.gacrc.uga.edu) Use command wget or curl to download software from internet, e.g., wget http://www.ebi.ac.uk/ena/data/view/srr1183952 Curl OL http://www.ebi.ac.uk/ena/data/view/srr1183952

Snapshot Only homes on zcluster and Sapelo are snapshotted! Note: home is for highly static data being used frequently Snapshots are completely invisible, read-only, and moment-in-time. 4 daily ones and 1 weekly one are maintained. Snapshots are eating up your Sapelo home 100GB, if there are frequent data modifications in home.

Backup Backup environment has not been implemented by GACRC yet. In the future, file systems to be included in GACRC Backup: Zcluster /home Sapelo /home Sapelo /project

Best Practice Suggestions from GACRC 1. From scratch (Sapelo /lustre1 or zcluster /esratch4), instead of from home, to submit your batch jobs or run your interactive jobs! Question: How to submit batch jobs from scratch? 1) From Sapelo /lustre1: Method 1: Login to Interactive (qlogin) cd /lustre1/username/workdir/ submit job Method 2: Login to Login Put cd /lustre1/username/workdir/ and qsub actual.sh 2) From zcluster /escratch4: in job submission script sub.sh qsub sub.sh Method 1: Login to Login Method 2: Login to Interactive (qlogin) cd /escratch4/username submit job

Best Practice Suggestions from GACRC 2. Clean Up Files that are not needed from scratch 3. Move Files from scratch to /project for long-term storage 4. Compress Files, especially text files in /project, to save space Please Do NOT Park Your Data in Scratch! Otherwise, whole system scratching performance will be affected, and your and others job will be affected!

Thank You!