HPC DOCUMENTATION. 3. Node Names and IP addresses:- Node details with respect to their individual IP addresses are given below:-

Similar documents
Introduction to HPC Using zcluster at GACRC

High Performance Computing (HPC) Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Kohinoor queuing document

Name Department/Research Area Have you used the Linux command line?

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Introduction to HPC Using zcluster at GACRC On-Class GENE 4220

OBTAINING AN ACCOUNT:

KISTI TACHYON2 SYSTEM Quick User Guide

MERCED CLUSTER BASICS Multi-Environment Research Computer for Exploration and Discovery A Centerpiece for Computational Science at UC Merced

Introduction to HPC Using zcluster at GACRC

Introduction to GALILEO

Introduction to HPC Using zcluster at GACRC

Introduction to HPC Using zcluster at GACRC

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Introduction to PICO Parallel & Production Enviroment

Advanced Topics in High Performance Scientific Computing [MA5327] Exercise 1

Using Sapelo2 Cluster at the GACRC

SGE Roll: Users Guide. Version Edition

Frequently Asked Questions

Knights Landing production environment on MARCONI

Cluster Clonetroop: HowTo 2014

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

SGE Roll: Users Guide. Version 5.3 Edition

Introduction to High Performance Computing (HPC) Resources at GACRC

Introduction to HPC Using the New Cluster at GACRC

Introduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)

Introduction to GALILEO

Introduction to CINECA Computer Environment

Shark Cluster Overview

An Introduction to Cluster Computing Using Newton

Batch Systems. Running calculations on HPC resources

Introduction to High Performance Computing (HPC) Resources at GACRC

Ambiente CINECA: moduli, job scripts, PBS. A. Grottesi (CINECA)

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Graham vs legacy systems

Übung zur Vorlesung Architektur paralleler Rechnersysteme

PACE. Instructional Cluster Environment (ICE) Orientation. Mehmet (Memo) Belgin, PhD Research Scientist, PACE

Introduction to CINECA HPC Environment

Introduction to GALILEO

Our new HPC-Cluster An overview

Getting started with the CEES Grid

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

PACE. Instructional Cluster Environment (ICE) Orientation. Research Scientist, PACE

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Introduction to High Performance Computing at UEA. Chris Collins Head of Research and Specialist Computing ITCS

Introduction to HPC Resources and Linux

Using the computational resources at the GACRC

Shark Cluster Overview

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

HPCC - Hrothgar Getting Started User Guide Gromacs

HPCC New User Training

HPC Workshop. Nov. 9, 2018 James Coyle, PhD Dir. Of High Perf. Computing

Migrating from Zcluster to Sapelo

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Getting Started with XSEDE. Dan Stanzione

Advanced Research Computing. ARC3 and GPUs. Mark Dixon

INTRODUCTION TO THE CLUSTER

SuperMike-II Launch Workshop. System Overview and Allocations

Effective Use of CCV Resources

Introduc)on to Hyades

Introduction to High Performance Computing Using Sapelo2 at GACRC

Introduction to Discovery.

Preparing Your Google Cloud VM for W4705

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Batch Systems. Running your jobs on an HPC machine

HPC Resources at Lehigh. Steve Anthony March 22, 2012

Genius Quick Start Guide

Introduc)on to High Performance Compu)ng Advanced Research Computing

Introduction to NCAR HPC. 25 May 2017 Consulting Services Group Brian Vanderwende

New User Tutorial. OSU High Performance Computing Center

Singularity: container formats

High Performance Computing (HPC) Club Training Session. Xinsheng (Shawn) Qin

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Wednesday, August 10, 11. The Texas Advanced Computing Center Michael B. Gonzales, Ph.D. Program Director, Computational Biology

Introduction to HPC Using the New Cluster at GACRC

Computing with the Moore Cluster

Introduction to Discovery.

To connect to the cluster, simply use a SSH or SFTP client to connect to:

MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization

UF Research Computing: Overview and Running STATA

Answers to Federal Reserve Questions. Administrator Training for University of Richmond

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

PBS Pro Documentation

Duke Compute Cluster Workshop. 3/28/2018 Tom Milledge rc.duke.edu

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009

Baadal: the IITD computing cloud (Beta release)

New User Seminar: Part 2 (best practices)

CENTER FOR HIGH PERFORMANCE COMPUTING. Overview of CHPC. Martin Čuma, PhD. Center for High Performance Computing

Grid Engine Users Guide. 5.5 Edition

NBIC TechTrack PBS Tutorial

Our Workshop Environment

Remote & Collaborative Visualization. Texas Advanced Computing Center

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

ICS-ACI System Basics

Working on the NewRiver Cluster

Big Data Analytics with Hadoop and Spark at OSC

Transcription:

HPC DOCUMENTATION 1. Hardware Resource :- Our HPC consists of Blade chassis with 5 blade servers and one GPU rack server. a.total available cores for computing: - 96 cores. b.cores reserved and dedicated to master node:- 04 c. Cores reserved and dedicated to GPU node:- 20 d.total cores ( summation of all):- 120 e.gpu details: - NVIDIA Tesla K20M GPU with 2496 cuda cores f. Total primary memory available: - 441GB. (96GB per blade and 64GB for GPU server) g.total Scratch area (SAN and Local) :- 1TB + 3.8TB=4.8 TB h.total User Data area( SAN) :- 4TB i. Total TFLOPS Theoretical:- 2.526 TFLOPS j. Total TFLOPS Practical: - 75% of above. (Limpack tools used for benchmarking) 2. Software Resources:- a. Operating system:- Cent OS 6.5 (Kernel Version - 2.6.32-431.11.2.el6.x86_64) b. Compilers & Libraries:- Intel Parallel Studio 2013, Cuda compliers. c. Application:- MATLAB 2013b,MATLAB 2015a, 2015b, gromacs, python, caffe, theano, keras d. Scheduler :- Grid Engine e. Monitoring tool:- Ganglia f. MPI:- Open MPI 1.8.1 3. Node Names and IP addresses:- Node details with respect to their individual IP addresses are given below:- Sl No. Host Name Node Type IIITD LAN IP 1 Hpc.iiitd.edu.in ( Mater Node) 192.168.1.170 2 compute-0-0 Compute node ssh from master node 3 compute-0-1 Compute node ssh from master node 4 compute-0-2 Compute node ssh from master node 5 compute-0-3 GP GPU server ssh from master node 6 Compute-0-4 Compute-0-4 ssh from master node Note: - Access to compute node from master node is given only to privileged users who requires it.

4. Scratch area and User Data area details:- Scratch area is mainly are of two types as given below:- a. /ssd-scratch:- It is 1TB shared space allocated from SAN SSD drives. This space is commonly available to all nodes. All users are expected to use this common space for saving rapidly accessed temporary computational data. b. /scratch: - It is of 900GB local disk space available to each compute nodes individually and 200GB from master node is allocated for this purpose. There are 5 compute nodes and 1 master as given above. Hence total space available will be around 3.8 TB across all nodes. Users are expected to save temporary data to this area. Data saved on this drive will be purged after every 15 days. Purging will be done on files which have been last accessed before 15 days. Hence users are advised to shift data from /scratch to respective data area like /home/<username>. User Data area is mainly available for users to save their computational data permanently. A total space of 4TB is allocated from SAN for this purpose. This space is available globally across all nodes. /home/<username> is the location of the data area available to each user. There are some restrictions and Quota applied on users and group basis on the DATA area. Details are given in section 5 and 6. No users are allowed to delete any other users data area files. 5. Users and Groups:- All HPC users are primarily divided into two groups namely faculty (which includes Ph.D students also) and student. Separate quota and restrictions have been applied to the individual groups. 6. Quota and Restrictions:- Quota and some restrictions have been implemented on HPC. These restrictions can be broadly classified as in two categories: - CPU cores restrictions and DATA restrictions. CPU cores restrictions a. No user, in student group, will be able to submit more than 2 jobs at a time.( Unless special permission is granted) b. No user, in faculty group, will be able to submit more than 4 jobs at a time.( Unless special permission is granted) c. No user, irrespective of groups, will be allowed to use more than 30 CPU cores per user account except GPU queue

users and unrestricted queue users. ( Unless special permission is granted) d. GPU queue users, irrespective of groups, can take at most 4 CPU cores and all CUDA cores. But they cannot fire more that 2 jobs for student group and 4 jobs in faculty group. ( Unless special permission is granted) DATA restrictions a. Faculty group users are allowed to use 40GB of user data space each. b. Student group users are allowed to use 10GB of user data space each. c. There is no quota or restriction imposed on /ssd-scratch and /scratch 7. Scratch area deletion policy:- Data saved under /scratch gets deleted after every 15 days. Purging will be done on files which have been last accessed before 15 days. Hence users are advised to shift data from /scratch to their respective data area like /home/<username>. 8. How to login to HPC:- Generally every users has to login to master node for firing jobs in any of the queues as explained above. Login to compute nodes directly is not allowed. If required users have to first login to master node and then they can login to any of the compute nodes. Steps to login to master node with respect to both MAC/Linux/Windows machines are explained below:- For Linux/MAC machine users Please follow the steps as given below:- a. ssh <username>@hpc.iiitd.edu.in b. Please use the same <username> as used in IIITD Domain c. First time password will be first 3 characters of the username followed by @4321. For example, suppose username is helpdesk then password will be hel@4321 d. Each user will be asked / forced to change their respective password when they login first time. For windows machine users Please follow the steps as given below:- a. Please use Putty and put hpc.iiitd.edu.in in place of host name and port selected should be 22. b. On command prompt please put your <username> when asked for. Please use the same <username> as used in IIITD Domain.

c. Please key in your password when asked for. First time password will be first 3 characters of the username followed by @4321. For example, suppose username is helpdesk then password will be hel@4321 d. Each user will be asked/ forced to change their respective password when they login first time.. 9. Queue details: - We have created following Queues to fire jobs through it. The details of number of cores allocated to respective queues are given below:- Queue Name TOTAL Cores --------------------------------------------------------- all.q 80 gpu.q 20 highpri.q 16 unrestricted.q 77 all.q :- This queue is available to both faculty and student groups. In this queue a total of 80 CPU cores are available but a restriction of 30 CPU cores per user account has been imposed on this queue. No user can submit more that 2 jobs at a time (unless specific permission is granted). All programming jobs are expected to be fired from this queue. All compute node CPU cores are dedicated for this queue. Waiting time in this queue is expected to be higher as this queue is being shared by all users. gpu.q:- This queue is available to both faculty and student groups. In this queue there are 4 CPU core and 2496 cuda cores available for usage. There are no restrictions imposed on this queue except that no user will be allowed to fire more than 2 jobs at a time (unless specific permission is granted). All GPU /Cuda programming related jobs are expected to be fired from this queue. Compute node Compute-0-3 is dedicated for this queue. Waiting time in this queue is expected to be much lesser as this queue is being shared only by GPU / Cuda programming users. highpri.q:- This queue is available to only faculty group. In this queue a total of 16 CPU cores are available for usage. There are no restrictions imposed on this queue except that no user will be allowed to fire more than 2 jobs at a time (unless specific permission is granted). Users are expected to use this queue in case they want to run a very high priority job which requires less CPU cores but needs lesser waiting time. unrestricted.q :- All 77 CPU cores are available in this queue. This queue is available only to faculty groups. Jobs that are being fired from this queue will have to wait till all required numbers of CPU

cores are freely available and are not being used by any other queue or jobs. 10. Job Submission Steps: - Steps for firing jobs are given below:- How to Submit a Parallel Job (Open MPI) Please create the following script and save it to any file name as shell file. For example let s save it as Jobscript.sh #!/bin/bash #$ -N <Parallel-Job> // Parellel-Job is the name of the job // #$ -cwd #$ -S /bin/bash #$ -e err.$job_id.$job_name #$ -o out.$job_id.$job_name #$ -q <queue name> // The queue where the users wants to fire jobs// #$ -pe mpi 8 // The maximum MPI users wants to use its upper // unset SGE_ROOT // limit is the number of cores available to the // /opt/mpi/1.8.1/bin/mpirun -np $NSLOTS hostfile //respective queue// $TMPDIR/machines./a.out > <output_file> // a.out is code file name and //outputfile is redirection file name// Please submit the said job using the following command:- qsub <Jobscript name > In our example we will use the following command:- qsub Jobscript.sh For checking status of various jobs fires please use the following commands:- qstat (provides status listing of jobs) qstat f qstat g c