Viglen NPACI Rocks. Getting Started and FAQ

Viglen NPACI Rocks Getting Started and FAQ

Table of Contents Viglen NPACI Rocks...1 Getting Started...3 Powering up the machines:...3 Checking node status...4 Through web interface:...4 Adding users:...7 Job Submission:...8 FAQ...10 Reinstalling a compute node...10 Adding applications to the cluster...11 Enabling web access to the cluster...11 Synchronising files across the cluster...12

Getting Started Powering up the machines: Power on the headnode first and allow the node to boot to the login screen. The cluster compute nodes rely on the headnode when booting client daemons, such as the queue client. Once the headnode is up, power on the rest of the compute nodes.

Checking node status Once all the nodes are powered on, you can check for any dead nodes using the CLI or Web interface. Through CLI: Login as root and perform the following [root@hostname:~]$ tentakel uptime ### compute-0-0 (stat:0, dur(s): 0.32) down Any nodes that failed to boot will be reported as down. Through web interface: The Ganglia scalable distributed monitoring system is configured on the cluster and can be used to discover dead nodes (along with cluster usage metrics) To access the ganglia web interface, user must launch the firefox browser from the headnode. (Please note external web access is disabled by default through the iptables configuration on the headnode, only ssh access is permitted) When logging into the headnode make sure to specify the X argument for ssh to enable XForwarding. [user@host:~]$ ssh X user@headnode [user@headnode:~]$ firefox And point the browser to http://localhost and you should see a welcome screen as below.

Click on the Cluster Status at the right side of the screen to display the ganglia monitoring screen.

Any nodes that are down or in use are reported through this interface. Please note that a node reported as dead here does not necessarily mean the node is dead, it could just mean that the ganglia monitoring daemon on the client has died. It is recommended that the users confirms the node status using tools such as ssh and ping

Adding users: To add a user, login as root and issue the useradd, set a password and synchronise using the rocks-user-sync command. [root@headnode:~]$ useradd <username> [root@headnode:~]$ passwd <username> [root@headnode:~]$ rocks sync users It is also recommend that the administrator for the cluster login as the user and set the ssh keys to allow passwordless access on the cluster. Any time a user account is created, this initial login will prompt the user for the path to the rsa key pair and an ssh passphrase. Leave all values blank for normal cluster operation.

Job Submission: Jobs are submitted to the queue in the form of a (bash) script. This script should contain a list of the command and arguments that you want to run as if you were running from the command line. Sample Job Submit Script #!/bin/bash # # SGE/PBS options can be specified here # e.g. set job to run in the current working directory #$ -cwd $HOME/myjobdir/myjobexecutable arg1 arg2 Save this file as ~/myjob.sh. To submit this job to the queue, use the qsub command passing the job script file as an argument. E.g. [user@headnode:~]$ qsub myjobs.sh Job status can be checked using qstat and jobs can be removed using qdel <jobid>. For more detailed usage, please refer to the SGE/PBS manuals supplied with the cluster.

Sample MPICH Job (64 core MPICH Job): #!/bin/bash #PBS -S /bin/bash #PBS -V # Request job to run on 8 nodes with 8 processes per node #PBS -l nodes=8:ppn=8 # Change to the working directory cd ${PBS_O_WORKDIR} # set global mem size export P4_GLOBMEMSIZE=64000000 # USERS MODIFY HERE # set file name and arguements MPIEXEC=/opt/mpiexec/bin/mpiexec EXECUTABLE=/opt/hpl/mpich-hpl/bin/xhpl ARGS="-v" # Load the mpich modules environment module load mpich/1.2.7 ########################################## # # # Output some useful job information. # # # ########################################## NPROCS=`wc -l < $PBS_NODEFILE` echo -----------------------------------------------------echo ' This job is allocated on '${NPROCS}' cpu(s)' echo 'Job is running on node(s): ' cat $PBS_NODEFILE echo -----------------------------------------------------echo PBS: qsub is running on $PBS_O_HOST echo PBS: originating queue is $PBS_O_QUEUE echo PBS: executing queue is $PBS_QUEUE echo PBS: working directory is $PBS_O_WORKDIR echo PBS: execution mode is $PBS_ENVIRONMENT echo PBS: job identifier is $PBS_JOBID echo PBS: job name is $PBS_JOBNAME echo PBS: node file is $PBS_NODEFILE echo PBS: current home directory is $PBS_O_HOME echo PBS: PATH = $PBS_O_PATH echo -----------------------------------------------------# Launch the Job ${MPIEXEC} ${ARGS} ${EXECUTABLE} Viglen provide sample job submission scripts for all the above jobs (~/qsub-scripts/). A directory is created when a new user is added with sample job submission scripts. Feel free to edit these for use with local applications/codes.

FAQ Reinstalling a compute node The recommended way of dealing with any node failures is to reinstall the problematic compute node (unless a hardware fault is present). The compute nodes are all set to boot from the network first, then local disk. The headnode can be configured to force any compute node to reinstall by setting the pxeboot flag. To see what pxeboot flags are currently set on your cluster, use the rocks command: [root@headnode:~]$ rocks list host pxeboot HOST ACTION headnode: -----compute-0-0 os compute-0-1 os... To flag a node for reinstallation use the rocks command to set the pxeboot flag to install. [..]$ rocks set host pxeboot compute-0-0 action=install Then power cycle the node (or reboot if node is still alive). The command can also be set back to boot local disk by setting the pxeboot flag back to os using the rocks command. [..]$ rocks set host pxeboot compute-0-0 action=os Power cycle the node: If IPMI modules are installed ipmitool U [user] P [passwd] h compute-0-0 chassis power off ipmitool U [user] P [passwd] h compute-0-0 chassis power on If APC PDUs are used: apc off compute-0-0 apc on compute-0-0 Note: If power cycling using the APC PDU, the BIOS power settings (Advance Boot Features Restore on AC Power Off) needs to be set to Last state on the compute node.

Adding applications to the cluster This topic is covered in details in the rocks user guide and online at http://www.rocksclusters.org/rolldocumentation/base/5.1/customization-adding-packages.html Enabling web access to the cluster By default, the firewall on the headnode will only allow ssh access. If you want to enable web access to the cluster (for e.g. monitoring through ganglia) you can allow this through the iptables configuration file /etc/sysconfig/iptables In the file, lines 18 and 19 are commented out, to enable web access uncomment these and restart iptables (/etc/init.d/iptables restart) # Uncomment the lines below to activate web access to the cluster. #-A INPUT m state --state NEW p tcp -dport https j ACCEPT #-A INPUT m state -state NEW p tcp -dport www j ACCEPT

Synchronising files across the cluster If there is a configuration file in/etc (or anywhere that s not an NFS share) that you want to synchronise across all nodes, there are two ways to do this. Option 1: Through the extend-compute.xml file (*note* changes to the file on the headnode will not automatically be synchronised more useful for configurations that need to be consistent on compute nodes only) In the <post> section of this file, create a <file> tag: <post> <file name= /path/to/file > # contents of file </file> When finished, verify the additions are syntactically correct: [user@host:site-profiles]$ xmllint extend-compute.xml Rebuild the distribution: [user@host:site-profiles]$ cd /home/install [user@host:install]$ rocks create distro Verify kickstart files are being generated correctly (from the /home/install directory. [user@host:install]$./sbin/kickstart.cgi -client= compute-0-0 This command should dump the kickstart file to screen (provided the node has already been installed and exists in the database). If not, repeat the above steps and try again. If the command is successful, set the pxeboot flag for the nodes (See Reinstalling a compute node ) and reboot the node. Option 2: Through the 411 subsystem Edit the file /var/411/files.mk The last line should read #FILE += /path/to/file

Uncomment this line, update with the path to the file you want to synchronise and run the command rocks sync users to apply this change. ie: FILE += /etc/hosts