Answers to Federal Reserve Questions. Training for University of Richmond

Similar documents
Answers to Federal Reserve Questions. Administrator Training for University of Richmond

User Guide of High Performance Computing Cluster in School of Physics

The cluster system. Introduction 22th February Jan Saalbach Scientific Computing Group

Quick Guide for the Torque Cluster Manager

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

HPC Cluster: Setup and Configuration HowTo Guide

Introduction to GALILEO

PBS Pro Documentation

Introduction to GALILEO

Introduction to PICO Parallel & Production Enviroment

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Viglen NPACI Rocks. Getting Started and FAQ

Torque Resource Manager Release Notes

Our new HPC-Cluster An overview

Grid Engine - A Batch System for DESY. Andreas Haupt, Peter Wegner DESY Zeuthen

Getting started with the CEES Grid

NBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen

Tech Computer Center Documentation

TORQUE Resource Manager5.0.2 release notes

Reduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

TORQUE Resource Manager Quick Start Guide Version

GPU Cluster Usage Tutorial

NBIC TechTrack PBS Tutorial

Using Sapelo2 Cluster at the GACRC

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center

Batch Systems. Running calculations on HPC resources

Effective Use of CCV Resources

UF Research Computing: Overview and Running STATA

Cluster Network Products

Introduction to CINECA Computer Environment

OpenPBS Users Manual

Introduction to GALILEO

Introduction to Discovery.

High Performance Beowulf Cluster Environment User Manual

Introduction to Discovery.

Computing with the Moore Cluster

PACE. Instructional Cluster Environment (ICE) Orientation. Research Scientist, PACE

TORQUE Resource Manager Release Notes

Introduction to Discovery.

Knights Landing production environment on MARCONI

Job Management on LONI and LSU HPC clusters

Introduction to HPC Using the New Cluster at GACRC

Name Department/Research Area Have you used the Linux command line?

and how to use TORQUE & Maui Piero Calucci

Batch system usage arm euthen F azo he Z J. B T

Introduction to HPC Using zcluster at GACRC

Queue systems. and how to use Torque/Maui. Piero Calucci. Scuola Internazionale Superiore di Studi Avanzati Trieste

Shark Cluster Overview

Batch environment PBS (Running applications on the Cray XC30) 1/18/2016

GACRC User Training: Migrating from Zcluster to Sapelo

Grid Engine Users Guide. 5.5 Edition

SGE Roll: Users Guide. Version Edition

Altair. PBS Professional 9.1. Quick Start Guide. for UNIX, Linux, and Windows

Shark Cluster Overview

Submitting and running jobs on PlaFRIM2 Redouane Bouchouirbat

A Hands-On Tutorial: RNA Sequencing Using High-Performance Computing

Using ITaP clusters for large scale statistical analysis with R. Doug Crabill Purdue University

Introduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)

Minnesota Supercomputing Institute Regents of the University of Minnesota. All rights reserved.

Running Jobs, Submission Scripts, Modules

Cluster Tools Batch Queue Job Control W. Trevor King May 20, 2008

X Grid Engine. Where X stands for Oracle Univa Open Son of more to come...?!?

High-Performance Reservoir Risk Assessment (Jacta Cluster)

Using ISMLL Cluster. Tutorial Lec 5. Mohsan Jameel, Information Systems and Machine Learning Lab, University of Hildesheim

Ambiente CINECA: moduli, job scripts, PBS. A. Grottesi (CINECA)

PACE. Instructional Cluster Environment (ICE) Orientation. Mehmet (Memo) Belgin, PhD Research Scientist, PACE

Migrating from Zcluster to Sapelo

Introduction to CINECA HPC Environment

Torque Resource Manager

PBS Pro and Ansys Examples

Quick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing

Sharpen Exercise: Using HPC resources and running parallel applications

Introduction to the NCAR HPC Systems. 25 May 2018 Consulting Services Group Brian Vanderwende

Introduc)on to Hyades

Using the Yale HPC Clusters

OBTAINING AN ACCOUNT:

CS/Math 471: Intro. to Scientific Computing

ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009

Cheese Cluster Training

Using the computational resources at the GACRC

Practical: a sample code

Introduction to HPC Numerical libraries on FERMI and PLX

HPCC New User Training

Introduction to HPC Using the New Cluster at GACRC

Using the IAC Chimera Cluster

Sharpen Exercise: Using HPC resources and running parallel applications

Running applications on the Cray XC30

PBS Professional Quick Start Guide

SLURM: Resource Management and Job Scheduling Software. Advanced Computing Center for Research and Education

Sun Grid Engine - A Batch System for DESY

SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH

Batch Systems. Running your jobs on an HPC machine

Introduction to HPC Using the New Cluster at GACRC

Simple examples how to run MPI program via PBS on Taurus HPC

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Introduction to HPCC at MSU

CENTER FOR HIGH PERFORMANCE COMPUTING. Overview of CHPC. Martin Čuma, PhD. Center for High Performance Computing

National Biochemical Computational Research Familiarize yourself with the account policy

Transcription:

Answers to Federal Reserve Questions Training for University of Richmond

2 Agenda Cluster Overview Software Modules PBS/Torque Ganglia ACT Utils

3 Cluster overview Systems switch ipmi switch 1x head node 1x storage node 20x compute nodes Network 1Gb Ethernet to nodes 10Gb Ethernet to head and storage 10Gb ethernet head storage 1Gb ethernet 100Mb physics20 physics20 physics01-20 All nodes connected to dedicated IPMI management network

4 Network Private network for node to node communication 10.1.1.0/24 Private network for IPMI communications 10.1.3.0/24 Head node is only machine with public IP address, provides firewall to protect cluster network

5 Cluster overview Nodes Dual CPU 6-core Intel Xeon Westmere processors at 2.66GHz 24GB of RAM Pair of 500GB drives in RAID0 striped configuration Head Dual CPU 6-core Intel Xeon Westmere processors at 2.66GHz 24GB of RAM 4x 500GB drives in a RAID10 NFS exports /act filesystem to all nodes

6 Cluster overview Head continued Runs Torque scheduler and server process DHCP server for cluster network Webserver for ganglia monitoring server Storage Single CPU 6-core Intel Xeon Westmere processors at 2.66GHz 12GB of RAM 8x 1TB drives in a RAID5 + hotspare NFS exports /home filesystem to all nodes

7 Modules command Modules is an easy way to setup the user environment for different pieces of software (path, variables, etc). Setup your.bashrc or.cshrc source /act/etc/profile.d/actbin.[sh csh] source /act/modules/3.2.6/init/[bash csh] module load null

8 Modules continued To see what modules you have available: module avail To load the environment for a particular module: module load modulename To unload the environment: module unload modulename module purge (removes all modules from environment) Modules are stored /act/modules/3.2.6/modulefiles - can customize for your own software

9 PBS/Torque introduction Basic setup Submitting serial jobs Submitting parallel jobs Job status Interactive jobs Managing the queuing system

10 Basic setup 3 main pieces pbs_server - main server components responds to user commands, etc pbs_sched - decide where to run jobs pbs_mom - a daemon that runs on every node that will execute jobs Each node has 12 slots which can be used for any number of jobs There is currently only 1 queue, named batch it s set as a FIFO (first-in firstout)

11 Submitting batch jobs Basic syntax: qsub jobscript jobscripts are simple shell scripts in either SH or CSH which at a minimum contain the name of your program. Here is the minimum jobscript: #!/bin/bash /path/to/executable

12 Common qsub arguments -q queuename name of the queue to run the job in -N jobname a descriptive name of the job -o filename path to the filename to write the contents of STDOUT -e filename path of the filename to write the contents of STDERR

13 Common qsub arguments -j oe Join the contents of STDERR and STDOUT into one file -m [a b e] Send out e-mail at different states (a = job aborted, b = job begins, e = job ends) -M emailaddr email address to send messages to -l resourcename=value,[resourcename=value] a list of resources needed to run this job

14 Resource options walltime maximum amount of real time the job can be running (if exceeded it will be terminated) mem maximum amount of memory to be consumed by the job nodes number of nodes requested to run this job

15 Submitting serial jobs A moderately complex job script can suggest command line parameters to PBS (prefixed with #PBS) that you may have left off of qsub as well as perform environment setup before running your program: #!/bin/bash #PBS -N testjob #PBS -j oe #PBS -q batch echo Running on `hostname`. echo It is now `date`. sleep 60 echo It is now `date`.

16 Submitting parallel jobs Very similar to batch jobs except a new argument -l nodes=x:ppn=x nodes: number of physical servers to run on ppn: processors per node to run on, i.e. 12 to run on all 12 cores Examples: run on 2 nodes using 12 cores per node, for a total of 24 cores: -l nodes=2:ppn=12 run on 4 nodes using 1 core per node, and 2 nodes using 2 cores per node: -l nodes=4:ppn=1+2:ppn=2

17 Job status You can check your own job submission status by looking at the output of "qstat". qstat only shows your own jobs, be default. To show jobs for all users, run "qstat -u '*'". To examine the details of a jobs, use "qstat -f jobid" Common job states R = running Q = queued E = error

18 Interactive jobs Using qsub -I you can submit an interactive job. When a job is scheduled, it lands you in a shell on the remote machine You can pass any argument that you d normally pass to qsub (i.e. qsub -N name -l nodes=1:ppn=5). When you exit, the resources are immediately freed for others to use.

19 Managing the queuing system qdel - delete a job that has been submitted qalter - alter a job after submission qhold - hold a job in the queue and do not execute qrls - release a hold on a job pbsnodes - see nodes configured in the system pbsnodes -o nodename - take a node offline from the queuing system pbsnodes -c nodename - clear the offline state of a node qmgr - create queues and manage system properties

20 More information Administrator manual: http://www.clusterresources.com/products/torque/docs/

21 Ganglia Ganglia installed on the master node and available at http://test3.advancedclustering.com/ganglia/ gstat command available - provides a command line overview of the ganglia collected data

22 ACT Utils ACT Utils is a series of commands to assist in managing your cluster, the suite contains the following commands: act_authsync - sync user/password/group information across nodes act_cp - copy files across nodes act_exec - execute any Linux command across nodes act_netboot - change network boot functionality for nodes act_powerctl - power on, off, or reboot nodes via IPMI or PDU act_sensors - retrieve temperatures, voltages, and fan speeds act_console - connect to the hosts s serial console via IPMI

23 ACT Utils common arguments All utilities have a common set of command line arguments that can be used to specify which nodes to interact with --all all nodes defined in the configuration file --exclude a comma seperated list of nodes to exclude from the command --nodes a comma separated list of node hostnames (i.e. physics01,physics02) --groups a comma separated list of group names (i.e. nodes) --range a range of nodes (i.e. physics01-physics05) Configuration (including groups and nodes) defined in /act/etc/act_utils.conf

24 Groups defined on your cluster nodes - all compute nodes

25 ACT Utils examples Find the current load on all the compute nodes act_exec -g nodes uptime Copy the /etc/resolv.conf file to all the login nodes act_cp -g nodes /etc/resolv.conf /etc/resolv.conf Shutdown every compute node except physics01 act_exec --group=nodes --exclude=physics01 /sbin/poweroff tell nodes physics01 - physics03 to boot into cloner on next boot act_netboot --nodes=a1pcmp01,a1pcmp03 --set=cloner-v3.14

26 ACT Utils examples Check the CPU temperatures on all nodes act_sensors -g nodes temps Connect to the console of physics05 act_console --node=physics05 If connected with X11 forwarding: act_console --use_xterm --node=physics05 Hardware power control act_powerctl --group=nodes [on off reboot status]

27 Shutting the system down To shut the system down for maintenance: act_utils -g nodes --node=storage /sbin/poweroff Then shut down the head /sbin/poweroff