Track 3: Molecular Visualization and Virtual Screening NBCR Summer Institute Session: NBCR clusters introduction August 11, 2006 Nadya Williams nadya@sdsc.edu Where to start National Biochemical Computational Research http://nbcr.net How to get an account: https://nbcr.net/accounts/apply.php Familiarize yourself with the account policy https://nbcr.net/accounts/toa.php Subscribe to NBCR-support mailing list https://lists.nbcr.net/mailman/listinfo.cgi/nbcr-support Subscribe to NBCR-announce mailing list https://lists.nbcr.net/mailman/listinfo.cgi/nbcr-announce 8/11/06 2006 UC Regents 2 1
Where to get help For support email to support@nbcr.net User services web page: https://nbcr.net/userservices.php Access to training sessions on Wiki Tools/downloads Documentation Cluster monitoring access User guides at Wiki https://www.nbcr.net:443/pub/wiki/ 8/11/06 2006 UC Regents 3 Remote login For login use ssh (not rsh or telnet) % ssh accname@clustername or % ssh clustername -l accname At first login set your ssh keys If not asked and there is no ~/.ssh/ do ssh-keygen -t rsa Add your local ssh keys % scp -p ~/.ssh/id_dsa.pub user@kryptonite.nbcr.net:local.pub % ssh user@kryptonite.nbcr.net % cat local.pub >> ~/.ssh/authorized_keys % rm local.pub Available clusters: kryptonite.nbcr.net athena.nbcr.net 8/11/06 2006 UC Regents 4 2
Keys management with Agent Login on a cluster % ssh user@kryptonite.nbcr.net Start an agent % ssh-agent $SHELL Add identities to your agent % ssh-add or % ssh-add ~/.ssh/mykeys/my_special_key.pub Verify that identities are added % ssh-add -l 1024 e9:a6:59:89:f0:f1:87:8e:88:54 /Users/nadya/.ssh/id_dsa (DSA) - OK Could not open connection to your authentication agent - ERROR! Can execute any command now on any node % cluster-fork ps -u$user % ssh c0-0 8/11/06 2006 UC Regents 5 Introduction to Sun Grid Engine What is a grid? A collection of computing resources that perform tasks A grid node can be a compute server, data collector, visualisation terminal.. SGE is a resource management software Accepts jobs submitted by users Schedules them for execution on appropriate systems based on resource management policies Can submit 100s of jobs without worrying where it will run 8/11/06 2006 UC Regents 6 3
What is SGE? Two versions of SGE: Sun Grid Engine (on Rocks clusters) Distributed under the open source license From sunsource.net Sun N1 Grid Engine N1 stack is available at no cost Paid support from SUN 8/11/06 2006 UC Regents 7 Job Management Not recommended to run jobs directly! Use installed load scheduler SUN Grid Engine Load management tool for HETEROGENEOUS distributed computing environment PBS/Torque More sophisticated scheduling Why? You can submit multiple jobs and have it queued (and go home!) Fair Share Allow other people to use the cluster also! (for Myrinet MPI jobs) 8/11/06 2006 UC Regents 8 4
Host Roles Master Host Controls overall cluster activity Frontend, head node It runs the master daemon: sge_qmaster, controlling queues, jobs, status, user access permission Also the scheduler: sge_schedd Execution Host executes SGE jobs execution daemon: sge_execd Runs jobs on its hosts Forwards sys status/info to sge_qmaster 8/11/06 2006 UC Regents 9 Host Roles continued Submit Host They are allowed for submitting and controlling batch job only No daemon required to run in this type of host. Administration Host SGE administrator console usually 8/11/06 2006 UC Regents 10 5
Job Management Your administrator must setup a global default queue (all.q) More fine-tunned queues can be setup depending on cluster/user community short.q, long.q, weekend.q, fluent.0.q, fluent.1.q As a user, you only need to know how to Submit your jobs (serial or MPI) Monitor your jobs Get the results 8/11/06 2006 UC Regents 11 Some SGE Commands Command qconf qmod qacct qalter qdel qhold qhost qmon qrsh qselect qsh qstat qsub qtcsh Description SGE's cluster, queue etc configuration Modify queue statues: enabled or suspended Extract accounting information from cluster Changes the attributes of submitted but pending jobs Job deletion Holds back submitted jobs for execution Shows status information about SGE hosts X-windows Motif interface SGE queue based rsh facility List queue matching selection criteria Opens an interactive shell on a low-loaded hosts Status listing of jobs and queues Commandline interface to submit jobs to SGE SGE queue based TCSH facility qtcsh, qsh - extended command shells that can transparently distribute execution of programs/applications to least loaded hosts via SGE. 8/11/06 2006 UC Regents 12 6
$ qhost $ qhost HOSTNAME ARCH NPROC LOAD MEMTOT MEMUSE SWAPTO SWAPUS ----------------------------------------------------------------------- global - - - - - - - compute-0-1 lx26-amd64 4 6.00 3.9G 585.3M 1000.0M 0.0 compute-0-2 lx26-amd64 4 5.00 3.9G 560.9M 1000.0M 0.0 compute-0-3 lx26-amd64 4 4.00 3.9G 534.3M 1000.0M 0.0 compute-0-4 lx26-amd64 4 5.00 3.9G 555.0M 1000.0M 0.0 compute-0-5 lx26-amd64 4 5.00 3.9G 559.8M 1000.0M 0.0 compute-0-6 lx26-amd64 4 5.00 3.9G 561.5M 1000.0M 0.0 compute-0-7 lx26-amd64 4 5.01 3.9G 551.6M 1000.0M 0.0 compute-0-8 lx26-amd64 4 5.00 3.9G 557.9M 1000.0M 0.0 compute-0-9 lx26-amd64 4 4.02 3.9G 541.8M 1000.0M 0.0 8/11/06 2006 UC Regents 13 QMON GUI interface for SGE Administration/Submission Requires you to run either Linux/Unix on your desktop or have a X-emulator (Hummingbird) on your Windows PC. 8/11/06 2006 UC Regents 14 7
Submitting Jobs Command line (qsub) & Graphical (qmon) Standard, Batch, Array, Interactive, Parallel SGE schedule jobs based on Job priorities User -> FIFO Admin -> can affect with priority settings Equal-Share-Scheduling Scheduler -> user_sort setting Prevents a single user from hogging the queues Recommended!!! 8/11/06 2006 UC Regents 15 $ qsub Output/error by default in home directory Look in /opt/gridengine/examples/jobs % qsub simple.sh your job 224 ("simple.sh") has been submitted % cd ~ % more simple.sh.e224 % more simple.sh.o224 Wed Aug 9 14:56:16 PDT 2006 Wed Aug 9 14:56:36 PDT 2006 Use qstat to check job status #!/bin/sh date sleep 10 hostname 8/11/06 2006 UC Regents 16 8
Submit autodock job % qsub adsub.sh your job 225 (adsub.sh") has been submitted #!/bin/sh # request Bourne shell as shell for job #$ -S /bin/sh # work from current dir and put stderr/stdout here #$ -cwd ulimit -s unlimited autodock3 -p test.dpf -l test.dlg status=$? if [ "$status" = "0" ] ; then echo "successful completion $status" else echo "error running autodock3" fi 8/11/06 2006 UC Regents 17 GUI Submit Monitor Control 8/11/06 2006 UC Regents 18 9
$ qconf Show all the queues % qconf -sql Show the given queue % qconf -sq all.q Show command usage % qconf -help Show complex attributes % qconf -sc 8/11/06 2006 UC Regents 19 Advanced Submit Advanced or Batch jobs == shell scripts Can be as complicated as you want or even an application! #!/bin/bash # # compiles my program every time and create the executable and run it! # # change to my working directory cd TEST # compile the job f77 flow.f -o flow -lm -latlas # run the job./flow myinput.dat 8/11/06 2006 UC Regents 20 10
Requestable Attributes User submit jobs by specifying a job requirement profile of the hosts or of the queues SGE will match the job requirements and run on suitable hosts Attributes Disk space CPU Memory Software (Fluent lic) OS 8/11/06 2006 UC Regents 21 Attributes continued Relop Relational operation used to compute whether a queue meets a user request Requestable Can be specified by user or not (eg in qsub) Consumable Manage limited resources, eg licence or cpu #name shortcut type value relop requestable consumable defs arch a STRING none == YES NO none num_proc p INT 1 == YES NO 0 load_avg la DOUBLE 99.99 >= NO NO 0 slots s INT 0 <= YES YES 1 % qsub -l arch=glinux load_avg=0.01 myjob.sh 8/11/06 2006 UC Regents 22 11
Attributes continued By default, all requests are hard Hard requests are checked first, followed by soft If hard request is not satisfied job is not run For soft requests, SGE attempts to run on best fit Important resources mt - memory total mf - memory free s - processor slots st - total swap How to request specifc memory/swap space/cpu/? % qsub -soft -l mt=250k,st=100k,mf=300g simple.sh % qsub -hard -l mt=250k,st=100k,mf=300g simple.sh 8/11/06 2006 UC Regents 23 Array Jobs Parameterized and repeated execution of the same program (in a script) is ideal for the array job facility SGE provides efficient implementation of array jobs Handle computations as an array of independent tasks joined into a single job Can monitor and controlled as a total or by individual tasks or subset of tasks 8/11/06 2006 UC Regents 24 12
$ qsub Submitting an Array Job from command line -l option requests for a hard CPU time limit of 45mins -t option defines the task index range 2-10:2 specifies 2,4,6,8,10 Uses $SGE_TASK_ID to find out whether they are task 2, 4, 6, 8 or 10 To find input record As seed for random number generator % qsub -l h_cpu=0:45:0 -t 2-10:2 render.sh data.in 8/11/06 2006 UC Regents 25 Job cleanup Use SGE command % qdel <job_id> Use Rocks command % cluster-fork killall <your_executable_name> 8/11/06 2006 UC Regents 26 13
SGE submit script Script contents #!/bin/tcsh #$ -S /bin/tcsh setenv MPI=/opt/mpich/gnu/bin $MPI/mpirun -machinefile machines -np $NSLOTS appname make it executable $ chmod +x runprog.sh 8/11/06 2006 UC Regents 27 Submit file options # meet given resource request #$ -l h_rt=600 # specify interpreting shell for the job #$ -S /bin/sh # use path for standard output of the job #$ -o /your/path # execute from current dir See man qsub for more options #$ -cwd # run on 32 processes in mpich PE #$ -pe mpich 32 # Export all environmental variables #$ -V # Export these environmental variables #$ -v MPI_ROOT,FOOBAR=BAR 8/11/06 2006 UC Regents 28 14
Online resources Rocks documentation: http://www.rocksclusters.org/wordpress Rocks SGE roll documentation: http://www.rocksclusters.org/rolldocumentation/sge/4.1/ 8/11/06 2006 UC Regents 29 15