ISTeC Cray High-Performance Computing System. Richard Casey, PhD RMRCE CSU Center for Bioinformatics
|
|
- Abel Bishop
- 6 years ago
- Views:
Transcription
1 ISTeC Cray High-Performance Computing System Richard Casey, PhD RMRCE CSU Center for Bioinformatics
2 Compute Node Status Check whether interactive and batch compute nodes are up or down: xtprocadmin NID (HEX) NODENAME TYPE STATUS MODE 12 0xc c0-0c0s3n0 compute up interactive 13 0xd c0-0c0s3n1 compute up interactive 14 0xe c0-0c0s3n2 compute up interactive 15 0xf c0-0c0s3n3 compute up interactive 16 0x10 c0-0c0s4n0 compute up interactive 17 0x11 c0-0c0s4n1 compute up interactive 18 0x12 c0-0c0s4n2 compute up interactive 42 0x2a c0-0c1s2n2 compute up batch 43 0x2b c0-0c1s2n3 compute up batch 44 0x2c c0-0c1s3n0 compute up batch 45 0x2d c0-0c1s3n1 compute up batch 61 0x3d c0-0c1s7n1 compute up batch 62 0x3e c0-0c1s7n2 compute up batch 63 0x3f c0-0c1s7n3 compute up batch Naming convention: CabinetX-Y Cage-X Slot-X Node-X i.e. Cabinet0-0,Cage0,Slot3,Node0 Currently 960 batch compute cores 288 interactive compute cores
3 Compute Node Status Check the state of interactive and batch compute nodes and whether they are already allocated to other user s jobs: xtnodestat Current Allocation Status at Tue Apr 19 08:15: Cabinet ID Service Nodes Cage X: Node X Slots (=blades) C0-0 n B n B n c1n n3 SSSaa;-- n2 aa;-- n1 aa;-- c0n0 SSSaa;-- s Batch Compute Nodes Allocated Batch Compute Nodes Free Batch Compute Nodes Interactive Compute Nodes Allocated Interactive Compute Nodes Free Interactive Compute Nodes Legend: nonexistent node S service node (login, boot, lustrefs) ; free interactive compute node - free batch compute node A allocated, but idle compute node? suspect compute node X down compute node Y down or admindown service node Z admindown compute node Available compute nodes: 4 interactive, 38 batch
4 Batch Jobs Torque/PBS Batch Queue Management System For submission and management of jobs in batch queues Use for jobs with large resource requirements (long-running, # of cores, memory, etc.) List all available queues: qstat Q (brief) qstat Qf (full) rcasey@cray2:~> qstat -Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext T batch 0 0 yes yes E Show the status of jobs in all queues: qstat (all queued jobs) qstat u username (only queued jobs for username ) (Note: if there are no jobs running in any of the batch queues, this command will show nothing and just return the Linux prompt). rcasey@cray2:~/lustrefs/mpi_c> qstat Job id Name User Time Use S Queue sdb mpic.job rcasey 0 R batch
5 Batch Jobs Common Job States Q: job is queued R: job is running E: job is exiting after having run C: job is completed after having run Submit a job to the default batch queue: qsub filename filename is the name of a file that contains batch queue commands Command line directives override batch script directives i.e. qsub N newname script ; newname overrides -N name in batch script Delete a job from the batch queues: qdel jobid jobid is the job ID number as displayed by the qstat command. You must be the owner of the job in order to delete it.
6 Sample Batch Job Script #!/bin/bash #PBS N jobname #PBS j oe #PBS l mppwidth=24 #PBS l walltime=1:00:00 #PBS q batch cd $PBS_O_WORKDIR date aprun n 24 executable PBS directives: -N: name of the job -j oe: combine standard output and standard error in single file -l mppwidth: specifies number of cores to allocate to job -l walltime: specifies maximum amount of wall clock time for job to run (hh:mm:ss); default = 5 years -q: specify which queue to submit the job to
7 Sample Batch Job Script PBS_O_WORKDIR environment variable is generated by Torque/PBS. Contains absolute path to directory from which you submitted your job. Required for Torque/PBS to find your executable files. Linux commands can be included in batch job script The value set in aprun -n parameter should match value set in PBS mppwidth directive i.e. #PBS l mppwidth=24 i.e. aprun n 24 exe Request proper resources: If -n or mppwidth > 960, job will be held in queued state for awhile and then deleted If mppwidth < -n, then error message apsched: claim exceeds reservation's nodecount If mppwidth > -n, then OK
8 Performance Analysis: Overview Performance analysis process consists of three basic steps: Instrument your program, to specify what kind of data you want to collect under what conditions Execute your instrumented program, to generate and capture the desired data Analyze the resulting data
9 Performance Analysis: Overview CrayPat, Perftools Cray s toolkit for instrumenting executables and producing data from runs Two basic types of analyses available: Sampling/Profiling: samples program counters at fixed intervals Tracing: traces function calls Type of analysis guided by build options and environment variables Profile/Trace function calls & loops Produce call graphs and execution profiles Adds some overhead to executable & increases runtime
10 Performance Analysis: Overview CrayPat, Perftools Outputs data in binary format which can be converted to text format, i.e. reports that contain statistical information CrayPat supports many languages + extensions C, C++, Fortran, MPI, OpenMP Use of binary instrumentation means relatively low overhead and no interference with compiler optimizations: Cray performance is dependent on compiler optimizations (loop vectorization especially), so this is a necessity for CrayPat Sampling instrumentation results in some overhead (< 2-3 %) Logfiles from runs are generally compact Check man craypat, pat_help, and the Craydoc Using Cray Performance Analysis Tools for more info
11 Performance Analysis: Workflow Load Cray, perftools, & craypat modules before compiling module load PrgEnv-cray module load perftools module load xt-craypat Compile code Use Cray compiler wrappers (cc, CC, ftn) Make sure object files (*.o) are retained C: cc -c exe.c, then cc o exe exe.o C++: CC c exe.c, then CC o exe exe.o Fortran: ftn c exe.f90, then ftn o exe exe.o If you use Makefiles, modify them to retain object files
12 Performance Analysis: Workflow Generate instrumented executable pat_build [options] exe Creates an instrumented executable exe+pat Execute instrumented code aprun n 1 exe+pat Creates file exe+pat+pid.xf (PID = process ID) Generate reports pat_report [options] exe+pat+pid.xf Outputs performance reports ( rpt text file)
13 Performance Analysis: Workflow pat_build By default, pat_build instruments code for sampling/profiling To instrument code for tracing, include one or several options: -w, -u, -g, -O, -T, -t i.e. pat_build w exe (enable tracing) i.e. pat_build u exe (trace user-defined functions only) i.e. pat_build g tracegroup exe (enable tracegroups) i.e. pat_build O reports exe (enable predefined reports) i.e. pat_build T funcname exe (trace specific function by name) i.e. pat_build t funclist exe (trace list of functions by name) Control instrumented program behavior and data collection 50+ optional runtime environment variables For example: To generate more detailed reports: export PAT_RT_SUMMARY=0 To measure MPI load imbalance: export PAT_RT_MPI_SYNC=1 for tracing export PAT_RT_MPI_SYNC=0 for sampling
14 Performance Analysis: Workflow Trace Groups Instrument code to trace all function references belonging to a specified group 30+ trace groups pat_build g tracegroup exe For example: To trace MPI calls, I/O calls, memory references: pat_build g mpi,io,heap exe Trace Group mpi omp stdio sysio io lustre heap Desc MPI calls OpenMP calls Application I/O calls System I/O calls stdio and sysio Lustre file system calls Memory references
15 Performance Analysis: Workflow Predefined reports 30+ predefined reports Use pat_report -O option For example, To show data by function name only: pat_report O profile exe+pat+pid.xf To show calling tree: pat_report O calltree exe+pat+pid.xf To show load balance across PE s: pat_report O load_balance exe+pat+pid.xf Report Option profile calltree load_balance heap_hiwater loops read_stats, write_stats Desc Show function names only Show calling tree top-down Show load balance across PE s Show max memory usage Show loop counts Show I/O statistics
16 Performance Analysis: Workflow Predefined Experiments Instrument code using preset environments 9 predefined experiments Choose experiment by setting PAT_RT_EXPERIMENT environment variable For example: To sample program counters at regular intervals: export PAT_RT_EXPERIMENT=samp_pc_time (default) Default sampling interval = 10,000 microseconds Change sampling interval with PAT_RT_INTERVAL, PAT_RT_INTERVAL_TIMER To trace function calls: export PAT_RT_EXPERIMENT=trace One of the pat_build trace options must be specified ( -g, -u, -t, -T, -O, -w )
17 Performance Analysis: Workflow Predefined Hardware Performance Counter Groups Build and instrument code as usual Set PAT_RT_HWPC env var (i.e. export PAT_RT_HWPC=3 ) 20 predefined groups available Summary L1, L2, L3 cache data accesses & misses Bandwidth info Hypertransport info Cycles stalled, resources idle/full Instructions and branches Instruction caches Cache hierarchy FP operations mix, vectorization, single-precision, double-precision Prefetches See man hwpc for full list and group numbers For summary data: export PAT_RT_HWPC=0 Shows MFLOPS, MIPS, computational intensity (FP ops / mem access), etc.
18 Performance Analysis: Reports #include <mpi.h> #include <stdio.h> #define N #define LOOPCNT void loop(float a[], float b[], float c[]); void main (int argc, char *argv[]) { int i, rank; float a[n], b[n], c[n]; for (i=0; i < N; i++) { a[i] = i * 1.0; b[i] = i * 1.0 } MPI_Init(&argc,&argv); MPI_Comm_rank(MPI_COMM_WORLD,&rank); for (i=0; i<loopcnt; i++) { loop(a, b, c); } MPI_Finalize(); if(rank==0) { for (i=0; i < N; i++) { printf("c[%d]= %f\n", i, c[i]);}} void loop(float a[], float b[], float c[]) { int i, numprocs; MPI_Comm_size(MPI_COMM_WORLD,&numprocs); for (i=0; i < N; i++) { c[i] = a[i] + b[i]; } }
19 Performance Analysis: Reports Default profiling cc c exe.c ; cc o exe exe.o ; pat_build exe ; pat_report *.xf > rpt CrayPat/X: Version 5.1 Revision 3746 (xf 3586) 08/20/10 16:46:28 Number of PEs (MPI ranks): 6 Numbers of PEs per Node: 6 PEs on 1 Node Numbers of Threads per PE: 1 thread on each of 6 PEs Number of Cores per Socket: 12 Execution start time: Mon Apr 18 13:23: System name, type, and speed: x86_ MHz Table 2: Profile by Group, Function, and Line Samp % Samp Imb. Imb. Group Samp Samp % Function Source Line PE='HIDE' 100.0% Total % USER subfunc 3 rcasey/perform/exe_c/exe.c % % line % % line.46 <= for loop in subfunc function ============================================
20 Performance Analysis: Reports Profile function calls pat_build exe ; pat_report O profile *.xf > rpt Table 1: Profile by Function Group and Function Samp % Samp Group Function 100.0% 2 Total % 1 ETC vfprintf 50.0% 1 USER subfunc ========================
21 Performance Analysis: Reports Profile user function calls pat_build u exe ; pat_report *.xf > rpt Table 1: Profile by Function Group and Function Time % Time Calls Group Function 100.0% Total % USER % main 23.7% subfunc =====================================
22 Performance Analysis: Reports Combine MPI calls, I/O calls, memory references pat_build g mpi,io,heap exe ; pat_report *.xf > rpt Table 1: Profile by Function Group and Function Time % Time Calls Group Function 100.0% Total % STDIO printf 20.1% USER % subfunc 3.2% main ====================================== 100.0% Total % USER % main 23.7% subfunc ===================================== Table 8: File Output Stats by Filename Write Write MB Write Writes Write File Name Time Rate B/Call MB/sec Total stdout ================================================================ Table 9: Wall Clock Time, Memory High Water Mark Process Process Total Time HiMem (MBytes) Total ========================== Table 2: Load Balance with MPI Message Stats Time % Time Group 100.0% Total % STDIO 19.8% USER ========================
23 Performance Analysis: Reports Loop statistics cc c h profile_generate exe.c ; cc o exe exe.o ; pat_build exe ; pat_report *.xf > rpt Table 1: Loop Stats from -hprofile_generate Loop Loop Loop Loop Loop Loop Function=/.LOOP\. U.B. Hit Trips Trips Trips Notes Time Avg Min Max 100.0% Total % vector main.loop.0.li % novec main.loop.1.li % novec main.loop.2.li % vector subfunc.loop.0.li.47 =================================================================== 100.0% Total % USER % main 23.7% subfunc =====================================
24 Performance Analysis: Reports I/O statistics pat_build O write_stats exe ; pat_report *.xf > rpt Table 1: File Output Stats by Filename Write Write MB Write Writes Write File Name Time Rate B/Call MB/sec Total stdout =================================================================
ISTeC Cray High Performance Computing System User s Guide Version 5.0 Updated 05/22/15
ISTeC Cray High Performance Computing System User s Guide Version 5.0 Updated 05/22/15 Contents Purpose and Audience for this Document... 3 System Architecture... 3 Website... 5 Accounts... 5 Access...
More informationBatch environment PBS (Running applications on the Cray XC30) 1/18/2016
Batch environment PBS (Running applications on the Cray XC30) 1/18/2016 1 Running on compute nodes By default, users do not log in and run applications on the compute nodes directly. Instead they launch
More informationARCHER Single Node Optimisation
ARCHER Single Node Optimisation Profiling Slides contributed by Cray and EPCC What is profiling? Analysing your code to find out the proportion of execution time spent in different routines. Essential
More informationUsing CrayPAT and Apprentice2: A Stepby-step
Using CrayPAT and Apprentice2: A Stepby-step guide Cray Inc. (2014) Abstract This tutorial introduces Cray XC30 users to the Cray Performance Analysis Tool and its Graphical User Interface, Apprentice2.
More informationCray Performance Tools Enhancements for Next Generation Systems Heidi Poxon
Cray Performance Tools Enhancements for Next Generation Systems Heidi Poxon Agenda Cray Performance Tools Overview Recent Enhancements Support for Cray systems with KNL 2 Cray Performance Analysis Tools
More informationRunning applications on the Cray XC30
Running applications on the Cray XC30 Running on compute nodes By default, users do not access compute nodes directly. Instead they launch jobs on compute nodes using one of three available modes: 1. Extreme
More informationMonitoring Power CrayPat-lite
Monitoring Power CrayPat-lite (Courtesy of Heidi Poxon) Monitoring Power on Intel Feedback to the user on performance and power consumption will be key to understanding the behavior of an applications
More informationLogging in to the CRAY
Logging in to the CRAY 1. Open Terminal Cray Hostname: cray2.colostate.edu Cray IP address: 129.82.103.183 On a Mac 2. type ssh username@cray2.colostate.edu where username is your account name 3. enter
More informationSteps to create a hybrid code
Steps to create a hybrid code Alistair Hart Cray Exascale Research Initiative Europe 29-30.Aug.12 PRACE training course, Edinburgh 1 Contents This lecture is all about finding as much parallelism as you
More informationSharpen Exercise: Using HPC resources and running parallel applications
Sharpen Exercise: Using HPC resources and running parallel applications Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into ARCHER frontend nodes and run commands.... 3 3.2 Download and extract
More informationCLE and How to. Jan Thorbecke
CLE and How to Start Your Application i Jan Thorbecke Scalable Software Architecture t 2 Scalable Software Architecture: Cray Linux Environment (CLE) Specialized ed Linux nodes Microkernel on Compute nodes,
More informationCompiling applications for the Cray XC
Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers
More informationNBIC TechTrack PBS Tutorial. by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen
NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen 1 NBIC PBS Tutorial This part is an introduction to clusters and the PBS
More informationNBIC TechTrack PBS Tutorial
NBIC TechTrack PBS Tutorial by Marcel Kempenaar, NBIC Bioinformatics Research Support group, University Medical Center Groningen Visit our webpage at: http://www.nbic.nl/support/brs 1 NBIC PBS Tutorial
More informationSharpen Exercise: Using HPC resources and running parallel applications
Sharpen Exercise: Using HPC resources and running parallel applications Andrew Turner, Dominic Sloan-Murphy, David Henty, Adrian Jackson Contents 1 Aims 2 2 Introduction 2 3 Instructions 3 3.1 Log into
More informationPROGRAMMING MODEL EXAMPLES
( Cray Inc 2015) PROGRAMMING MODEL EXAMPLES DEMONSTRATION EXAMPLES OF VARIOUS PROGRAMMING MODELS OVERVIEW Building an application to use multiple processors (cores, cpus, nodes) can be done in various
More informationBatch Systems. Running calculations on HPC resources
Batch Systems Running calculations on HPC resources Outline What is a batch system? How do I interact with the batch system Job submission scripts Interactive jobs Common batch systems Converting between
More informationPractical: a sample code
Practical: a sample code Alistair Hart Cray Exascale Research Initiative Europe 1 Aims The aim of this practical is to examine, compile and run a simple, pre-prepared OpenACC code The aims of this are:
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationPRACTICAL MACHINE SPECIFIC COMMANDS KRAKEN
PRACTICAL MACHINE SPECIFIC COMMANDS KRAKEN Myvizhi Esai Selvan Department of Chemical and Biomolecular Engineering University of Tennessee, Knoxville October, 2008 Machine: a Cray XT4 system with 4512
More informationMPI introduction - exercises -
MPI introduction - exercises - Paolo Ramieri, Maurizio Cremonesi May 2016 Startup notes Access the server and go on scratch partition: ssh a08tra49@login.galileo.cineca.it cd $CINECA_SCRATCH Create a job
More informationFirst steps on using an HPC service ARCHER
First steps on using an HPC service ARCHER ARCHER Service Overview and Introduction ARCHER in a nutshell UK National Supercomputing Service Cray XC30 Hardware Nodes based on 2 Intel Ivy Bridge 12-core
More informationDebugging on Blue Waters
Debugging on Blue Waters Debugging tools and techniques for Blue Waters are described here with example sessions, output, and pointers to small test codes. For tutorial purposes, this material will work
More informationImage Sharpening. Practical Introduction to HPC Exercise. Instructions for Cirrus Tier-2 System
Image Sharpening Practical Introduction to HPC Exercise Instructions for Cirrus Tier-2 System 2 1. Aims The aim of this exercise is to get you used to logging into an HPC resource, using the command line
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationHeidi Poxon Cray Inc.
Heidi Poxon Topics Introduction Steps to using the Cray performance tools Automatic profiling analysis Performance Counters 2 Design Goals Assist the user with application performance analysis and optimization
More informationIntroduction to Molecular Dynamics on ARCHER: Instructions for running parallel jobs on ARCHER
Introduction to Molecular Dynamics on ARCHER: Instructions for running parallel jobs on ARCHER 1 Introduction This handout contains basic instructions for how to login in to ARCHER and submit jobs to the
More informationPBS Pro Documentation
Introduction Most jobs will require greater resources than are available on individual nodes. All jobs must be scheduled via the batch job system. The batch job system in use is PBS Pro. Jobs are submitted
More informationP a g e 1. HPC Example for C with OpenMPI
P a g e 1 HPC Example for C with OpenMPI Revision History Version Date Prepared By Summary of Changes 1.0 Jul 3, 2017 Raymond Tsang Initial release 1.1 Jul 24, 2018 Ray Cheung Minor change HPC Example
More informationUser Guide of High Performance Computing Cluster in School of Physics
User Guide of High Performance Computing Cluster in School of Physics Prepared by Sue Yang (xue.yang@sydney.edu.au) This document aims at helping users to quickly log into the cluster, set up the software
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it
More informationReveal. Dr. Stephen Sachs
Reveal Dr. Stephen Sachs Agenda Reveal Overview Loop work estimates Creating program library with CCE Using Reveal to add OpenMP 2 Cray Compiler Optimization Feedback OpenMP Assistance MCDRAM Allocation
More informationCOMPILING FOR THE ARCHER HARDWARE. Slides contributed by Cray and EPCC
COMPILING FOR THE ARCHER HARDWARE Slides contributed by Cray and EPCC Modules The Cray Programming Environment uses the GNU modules framework to support multiple software versions and to create integrated
More informationHeidi Poxon Cray Inc.
Heidi Poxon Topics GPU support in the Cray performance tools CUDA proxy MPI support for GPUs (GPU-to-GPU) 2 3 Programming Models Supported for the GPU Goal is to provide whole program analysis for programs
More informationShifter on Blue Waters
Shifter on Blue Waters Why Containers? Your Computer Another Computer (Supercomputer) Application Application software libraries System libraries software libraries System libraries Why Containers? Your
More informationGetting started with the CEES Grid
Getting started with the CEES Grid October, 2013 CEES HPC Manager: Dennis Michael, dennis@stanford.edu, 723-2014, Mitchell Building room 415. Please see our web site at http://cees.stanford.edu. Account
More informationSupercomputing environment TMA4280 Introduction to Supercomputing
Supercomputing environment TMA4280 Introduction to Supercomputing NTNU, IMF February 21. 2018 1 Supercomputing environment Supercomputers use UNIX-type operating systems. Predominantly Linux. Using a shell
More informationCompute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003
Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 2003 Compute Cluster Server Lab 2: Carrying out Jobs under Microsoft Compute Cluster Server 20031 Lab Objective...1
More informationIntel Manycore Testing Lab (MTL) - Linux Getting Started Guide
Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide Introduction What are the intended uses of the MTL? The MTL is prioritized for supporting the Intel Academic Community for the testing, validation
More informationTech Computer Center Documentation
Tech Computer Center Documentation Release 0 TCC Doc February 17, 2014 Contents 1 TCC s User Documentation 1 1.1 TCC SGI Altix ICE Cluster User s Guide................................ 1 i ii CHAPTER 1
More informationOur Workshop Environment
Our Workshop Environment John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center Copyright 2015 Our Environment Today Your laptops or workstations: only used for portal access Blue Waters
More informationIntroduction to PICO Parallel & Production Enviroment
Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it
More informationIntroduction to Parallel Programming with MPI
Introduction to Parallel Programming with MPI PICASso Tutorial October 25-26, 2006 Stéphane Ethier (ethier@pppl.gov) Computational Plasma Physics Group Princeton Plasma Physics Lab Why Parallel Computing?
More informationIntroduction to SahasraT. RAVITEJA K Applications Analyst, Cray inc E Mail :
Introduction to SahasraT RAVITEJA K Applications Analyst, Cray inc E Mail : raviteja@cray.com 1 1. Introduction to SahasraT 2. Cray Software stack 3. Compile applications on XC 4. Run applications on XC
More informationToward Automated Application Profiling on Cray Systems
Toward Automated Application Profiling on Cray Systems Charlene Yang, Brian Friesen, Thorsten Kurth, Brandon Cook NERSC at LBNL Samuel Williams CRD at LBNL I have a dream.. M.L.K. Collect performance data:
More informationQuick Guide for the Torque Cluster Manager
Quick Guide for the Torque Cluster Manager Introduction: One of the main purposes of the Aries Cluster is to accommodate especially long-running programs. Users who run long jobs (which take hours or days
More informationPerformance Measurement and Analysis Tools Installation Guide S
Performance Measurement and Analysis Tools Installation Guide S-2474-63 Contents About Cray Performance Measurement and Analysis Tools...3 Install Performance Measurement and Analysis Tools on Cray Systems...4
More informationUBDA Platform User Gudie. 16 July P a g e 1
16 July 2018 P a g e 1 Revision History Version Date Prepared By Summary of Changes 1.0 Jul 16, 2018 Initial release P a g e 2 Table of Contents 1. Introduction... 4 2. Perform the test... 5 3 Job submission...
More informationAnswers to Federal Reserve Questions. Training for University of Richmond
Answers to Federal Reserve Questions Training for University of Richmond 2 Agenda Cluster Overview Software Modules PBS/Torque Ganglia ACT Utils 3 Cluster overview Systems switch ipmi switch 1x head node
More informationHybrid MPI+OpenMP Parallel MD
Hybrid MPI+OpenMP Parallel MD Aiichiro Nakano Collaboratory for Advanced Computing & Simulations Department of Computer Science Department of Physics & Astronomy Department of Chemical Engineering & Materials
More informationSimple examples how to run MPI program via PBS on Taurus HPC
Simple examples how to run MPI program via PBS on Taurus HPC MPI setup There's a number of MPI implementations install on the cluster. You can list them all issuing the following command: module avail/load/list/unload
More informationAn Introduction to the Cray X1E
An Introduction to the Cray X1E Richard Tran Mills (with help from Mark Fahey and Trey White) Scientific Computing Group National Center for Computational Sciences Oak Ridge National Laboratory 2006 NCCS
More informationACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009
ACEnet for CS6702 Ross Dickson, Computational Research Consultant 29 Sep 2009 What is ACEnet? Shared resource......for research computing... physics, chemistry, oceanography, biology, math, engineering,
More informationJURECA Tuning for the platform
JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)
More informationVIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING. BSC Tools Hands-On. Germán Llort, Judit Giménez. Barcelona Supercomputing Center
BSC Tools Hands-On Germán Llort, Judit Giménez Barcelona Supercomputing Center 2 VIRTUAL INSTITUTE HIGH PRODUCTIVITY SUPERCOMPUTING Getting a trace with Extrae Extrae features Platforms Intel, Cray, BlueGene,
More informationTools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,
Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - fabio.baruffa@lrz.de LRZ, 27.6.- 29.6.2016 Architecture Overview Intel Xeon Processor Intel Xeon Phi Coprocessor, 1st generation Intel Xeon
More informationReveal Heidi Poxon Sr. Principal Engineer Cray Programming Environment
Reveal Heidi Poxon Sr. Principal Engineer Cray Programming Environment Legal Disclaimer Information in this document is provided in connection with Cray Inc. products. No license, express or implied, to
More informationSGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH
SGI Altix Running Batch Jobs With PBSPro Reiner Vogelsang SGI GmbH reiner@sgi.com Module Objectives After completion of this module you should be able to Submit batch jobs Create job chains Monitor your
More informationBlue Waters Programming Environment
December 3, 2013 Blue Waters Programming Environment Blue Waters User Workshop December 3, 2013 Science and Engineering Applications Support Documentation on Portal 2 All of this information is Available
More informationBatch Systems. Running your jobs on an HPC machine
Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationRunning Jobs on Blue Waters. Greg Bauer
Running Jobs on Blue Waters Greg Bauer Policies and Practices Placement Checkpointing Monitoring a job Getting a nodelist Viewing the torus 2 Resource and Job Scheduling Policies Runtime limits expected
More informationReducing Cluster Compatibility Mode (CCM) Complexity
Reducing Cluster Compatibility Mode (CCM) Complexity Marlys Kohnke Cray Inc. St. Paul, MN USA kohnke@cray.com Abstract Cluster Compatibility Mode (CCM) provides a suitable environment for running out of
More informationMolecular Modelling and the Cray XC30 Performance Counters. Michael Bareford, ARCHER CSE Team
Molecular Modelling and the Cray XC30 Performance Counters Michael Bareford, ARCHER CSE Team michael.bareford@epcc.ed.ac.uk Reusing this material This work is licensed under a Creative Commons Attribution-
More informationHPC Input/Output. I/O and Darshan. Cristian Simarro User Support Section
HPC Input/Output I/O and Darshan Cristian Simarro Cristian.Simarro@ecmwf.int User Support Section Index Lustre summary HPC I/O Different I/O methods Darshan Introduction Goals Considerations How to use
More informationAssignment 2 Using Paraguin to Create Parallel Programs
Overview Assignment 2 Using Paraguin to Create Parallel Programs C. Ferner and B. Wilkinson Minor clarification Oct 11, 2013 The goal of this assignment is to use the Paraguin compiler to create parallel
More informationAdvanced Message-Passing Interface (MPI)
Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC On-class STAT8330 Georgia Advanced Computing Resource Center University of Georgia Suchitra Pakala pakala@uga.edu Slides courtesy: Zhoufei Hou 1 Outline What
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Alessandro Grottesi a.grottesi@cineca.it SuperComputing Applications and
More informationThe Message Passing Model
Introduction to MPI The Message Passing Model Applications that do not share a global address space need a Message Passing Framework. An application passes messages among processes in order to perform
More informationQuick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing
Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Linux/Unix basic commands Basic command structure:
More informationIntroduction to GALILEO
November 27, 2016 Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it SuperComputing Applications and Innovation Department
More informationHigh Performance Computing (HPC) Using zcluster at GACRC
High Performance Computing (HPC) Using zcluster at GACRC On-class STAT8060 Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC?
More informationCluster Clonetroop: HowTo 2014
2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's
More informationSequence Alignment. Practical Introduction to HPC Exercise
1 Sequence Alignment Practical Introduction to HPC Exercise Aims The aims of this exercise are to get you used to logging on to an HPC machine, using the command line in a terminal window and an editor
More informationPart One: The Files. C MPI Slurm Tutorial - Hello World. Introduction. Hello World! hello.tar. The files, summary. Output Files, summary
C MPI Slurm Tutorial - Hello World Introduction The example shown here demonstrates the use of the Slurm Scheduler for the purpose of running a C/MPI program. Knowledge of C is assumed. Having read the
More informationTo connect to the cluster, simply use a SSH or SFTP client to connect to:
RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, phoenix.ce.rit.edu, serves as the master controller or head
More informationBeacon Quickstart Guide at AACE/NICS
Beacon Intel MIC Cluster Beacon Overview Beacon Quickstart Guide at AACE/NICS Each compute node has 2 8- core Intel Xeon E5-2670 with 256GB of RAM All compute nodes also contain 4 KNC cards (mic0/1/2/3)
More informationQuick Start Guide. by Burak Himmetoglu. Supercomputing Consultant. Enterprise Technology Services & Center for Scientific Computing
Quick Start Guide by Burak Himmetoglu Supercomputing Consultant Enterprise Technology Services & Center for Scientific Computing E-mail: bhimmetoglu@ucsb.edu Contents User access, logging in Linux/Unix
More informationThe DTU HPC system. and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results.
The DTU HPC system and how to use TopOpt in PETSc on a HPC system, visualize and 3D print results. Niels Aage Department of Mechanical Engineering Technical University of Denmark Email: naage@mek.dtu.dk
More informationIntroduction to CINECA Computer Environment
Introduction to CINECA Computer Environment Today you will learn... Basic commands for UNIX environment @ CINECA How to submitt your job to the PBS queueing system on Eurora Tutorial #1: Example: launch
More informationInvestigating and Vectorizing IFS on a Cray Supercomputer
Investigating and Vectorizing IFS on a Cray Supercomputer Ilias Katsardis (Cray) Deborah Salmond, Sami Saarinen (ECMWF) 17th Workshop on High Performance Computing in Meteorology 24-28 October 2016 Introduction
More informationAWP ODC QUICK START GUIDE
AWP ODC QUICK START GUIDE 1. CPU 1.1. Prepare the code Obtain and compile the code, possibly at a path which does not purge. Use the appropriate makefile. For Bluewaters (/u/sciteam/poyraz/scratch/quick_start/cpu/src
More informationCS C Primer. Tyler Szepesi. January 16, 2013
January 16, 2013 Topics 1 Why C? 2 Data Types 3 Memory 4 Files 5 Endianness 6 Resources Why C? C is exteremely flexible and gives control to the programmer Allows users to break rigid rules, which are
More informationAdvanced Job Launching. mapping applications to hardware
Advanced Job Launching mapping applications to hardware A Quick Recap - Glossary of terms Hardware This terminology is used to cover hardware from multiple vendors Socket The hardware you can touch and
More informationIntroduction to Unix Environment: modules, job scripts, PBS. N. Spallanzani (CINECA)
Introduction to Unix Environment: modules, job scripts, PBS N. Spallanzani (CINECA) Bologna PATC 2016 In this tutorial you will learn... How to get familiar with UNIX environment @ CINECA How to submit
More informationBefore We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop
Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources
More informationProgramming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018
Programming Techniques for Supercomputers HPC Services @ RRZE University Erlangen-Nürnberg Sommersemester 2018 Outline Login to RRZE s Emmy cluster Basic environment Some guidelines First Assignment 2
More informationIntroduction to HPC Using the New Cluster at GACRC
Introduction to HPC Using the New Cluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is the new cluster
More informationParameter searches and the batch system
Parameter searches and the batch system Scientific Computing Group css@rrzn.uni-hannover.de Parameter searches and the batch system Scientific Computing Group 1st of October 2012 1 Contents 1 Parameter
More informationCSC Supercomputing Environment
CSC Supercomputing Environment Jussi Enkovaara Slides by T. Zwinger, T. Bergman, and Atte Sillanpää CSC Tieteen tietotekniikan keskus Oy CSC IT Center for Science Ltd. CSC IT Center for Science Ltd. Services:
More informationMIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization
MIGRATING TO THE SHARED COMPUTING CLUSTER (SCC) SCV Staff Boston University Scientific Computing and Visualization 2 Glenn Bresnahan Director, SCV MGHPCC Buy-in Program Kadin Tseng HPC Programmer/Consultant
More informationTable of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine
Table of Contents Table of Contents Job Manager for remote execution of QuantumATK scripts A single remote machine Settings Environment Resources Notifications Diagnostics Save and test the new machine
More informationParallel Programming Languages 1 - OpenMP
some slides are from High-Performance Parallel Scientific Computing, 2008, Purdue University & CSCI-UA.0480-003: Parallel Computing, Spring 2015, New York University Parallel Programming Languages 1 -
More informationRunning Jobs, Submission Scripts, Modules
9/17/15 Running Jobs, Submission Scripts, Modules 16,384 cores total of about 21,000 cores today Infiniband interconnect >3PB fast, high-availability, storage GPGPUs Large memory nodes (512GB to 1TB of
More informationITCS 4145/5145 Assignment 2
ITCS 4145/5145 Assignment 2 Compiling and running MPI programs Author: B. Wilkinson and Clayton S. Ferner. Modification date: September 10, 2012 In this assignment, the workpool computations done in Assignment
More informationIntroduction to HPC Using zcluster at GACRC
Introduction to HPC Using zcluster at GACRC Georgia Advanced Computing Resource Center University of Georgia Zhuofei Hou, HPC Trainer zhuofei@uga.edu Outline What is GACRC? What is HPC Concept? What is
More informationPractical Introduction to Message-Passing Interface (MPI)
1 Outline of the workshop 2 Practical Introduction to Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Theoretical / practical introduction Parallelizing your
More informationReduces latency and buffer overhead. Messaging occurs at a speed close to the processors being directly connected. Less error detection
Switching Operational modes: Store-and-forward: Each switch receives an entire packet before it forwards it onto the next switch - useful in a general purpose network (I.e. a LAN). usually, there is a
More informationCSCS Proposal writing webinar Technical review. 12th April 2015 CSCS
CSCS Proposal writing webinar Technical review 12th April 2015 CSCS Agenda Tips for new applicants CSCS overview Allocation process Guidelines Basic concepts Performance tools Demo Q&A open discussion
More informationKNL tools. Dr. Fabio Baruffa
KNL tools Dr. Fabio Baruffa fabio.baruffa@lrz.de 2 Which tool do I use? A roadmap to optimization We will focus on tools developed by Intel, available to users of the LRZ systems. Again, we will skip the
More information