Mitglied der Helmholtz-Gemeinschaft JUQUEEN. Best Practices. Florian Janetzko. 29. November 2013
|
|
- Frank James
- 5 years ago
- Views:
Transcription
1 Mitglied der Helmholtz-Gemeinschaft JUQUEEN Best Practices 29. November 2013 Florian Janetzko
2 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 2
3 JUQUEEN System Architecture IBM Blue Gene/Q JUQUEEN IBM PowerPC A2 1.6 GHz, 16 cores/node, 4-way SMT, 64-bit 4-wide (dbl) SIMD (FMA) 16 GB RAM per node Torus network 28 racks, 458,752 cores 5.9 Petaflop/s peak Connected to a Global Parallel File System (GPFS) with 10 PByte online disk and 37 PByte offline tape capacity 29. November 2013 Slide 3
4 JUQUEEN Challenges Chip 4-way SMT with1 integer + 1FPU instruction per cycle filling pipes 4-wide SIMD efficient vectorization Memory 1 GB/core and 0.5 GB/core for pure MPI codes memory consumption HW support for transactional memory efficient usage Network Torus network Mapping of tasks, communicators (communication pattern) I/O Processing large amounts of data Efficient I/O strategy and management Parallelism MPP system Scalability 29. November 2013 Slide 4
5 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 5
6 Module Environment Module concept Provides overview over available software packages Eases use of software packages Access to software packages, libraries Supply of different versions of applications Supply of application-specific information Enables dynamic modification of users environment Environment variables (PATH, LD_LIBRARY_PATH, MANPATH, ) are set appropriately Detection of conflicts between applications 29. November 2013 Slide 6
7 Module Environment $ module <options> <module> Option <no option> avail list load unload help show purge Description Lists available options of the module command Lists all available modules Lists modules currently loaded Loads a module Unloads a module Lists information about a module Information about settings done by the module Unloads all modules 29. November 2013 Slide 7
8 Module Environment Six module categories COMPILER IO Different compilers and versions of compilers I/O libraries and tools MATH MISC Mathematical libraries and software packages Software not fitting into another category SCIENTIFIC Software packages from different scientific fields TOOLS Performance analysis, debugger, etc. Software for Compute Nodes: /bgsys/local Front-end Nodes: /usr/local 29. November 2013 Slide 8
9 Module Environment $ module avail /bgsys/local/modulefiles/tools /bgsys/local/modulefiles/scientific cp2k/ cpmd/3.15.1_c(default) namd/2.8 cpmd/ lammps/30aug12 namd/2.9(default) cpmd/3.15.1_a lammps/5may12(default) cpmd/3.15.1_b libint/ /usr/local/modulefiles/compiler cmake/2.8.8(default) /usr/local/modulefiles/math /usr/local/modulefiles/scientific /usr/local/modulefiles/io /usr/local/modulefiles/tools UNITE/ /usr/local/modulefiles/misc November 2013 Slide 9
10 Module Environment Applications & Libraries Selected mathematical applications and libraries arpack (2.1) gsl (1.15) mumps (4.10.0) scalapack (2.0.2) fftw (2.1.5, 3.3.3) hypre (2.9.0) parmetis (4.0.2) sprng (2.0) gmp (5.0.5) lapack (3.4.2) petsc (3.4.2) sundials (2.5.0) Selected scientific applications CPMD (3.15.3) Gromacs (4.5.5) OpenFOAM (2.1.1) CP2K ( ) Lammps (5May12,30Aug12) QE (5.0.1) GPAW* Namd (2.8, 2.9) VASP** * In preparation ** Software not installed but makefiles are available 29. November 2013 Slide 10
11 Module Environment Applications & Libraries Selected I/O libraries and tools Darshan (2.2.4) netcdf (4.3) SIONlib (1.4.3) HDF5 (1.8.11) parallel-netcdf (1.3.1) Selected tools Cmake (2.8.11) hpctoolkit (5.3.2) Tau (2.22.3b4) Clang (3.4) PAPI (5.1.1) Totalview (8.12.0) extrae (2.4.1) Scalasca (2.1) Vampir (8.1) 29. November 2013 Slide 11
12 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 12
13 Running Simulations Batch System Execution of applications managed by LoadLeveler Users submit jobs using a job command file LoadLeveler allocates computing resources to run jobs The scheduling of jobs depends on Availability of resources Job priority (jobs with larger core counts are privileged) Jobs run in queues (job classes) Classes chosen by LoadLeveler according to core count of job For Information about LoadLeveler on JUQUEEN see November 2013 Slide 13
14 LoadLeveler - Commands Command llsubmit <jobfile> llq llq l <job ID> llq s <job ID> llq u <user> llcancel <job ID> llstatus llclass llqx Description Sends job to the queuing system Lists all queued and running jobs detailed information about the specified job detailed information about a specific queued job, e.g. expected start time lists all jobs of the specified user Kills the specified job Displays the status of LoadLeveler Lists existing classes and their properties Shows detailed information about all jobs 29. November 2013 Slide 14
15 LoadLeveler Command Examples Submitting a batch job: llsubmit $ llsubmit batch-job.js llsubmit: Processed command file through Submit Filter: "/bgdata/admin/loadl/extensions/filter". llsubmit: The job "juqueen2c1.zam.kfa-juelich.de.35395" has been submitted. Query status of submitted jobs: llq $ llq -u userid Id Owner Submitted ST PRI Class Running On juqueen2c userid 11/15 10:11 I 50 n001 1 job step(s) in query, 1 waiting, 0 pending, 0 running, 0 held, 0 preempted Cancel a submitted job: llcancel $ llcancel juqueen2c llcancel: Cancel command has been sent to the central manager. 29. November 2013 Slide 15
16 LoadLeveler Job Command File ASCII file containing two major parts 1. LoadLeveler job keywords block at the beginning of a file LoadLeveler keywords have the form #@<keyword> # can be separated by any number of blanks 2. One or more application script blocks Regular shell script Can contain any shell command 29. November 2013 Slide 16
17 LoadLeveler Standard Keywords Keyword complete error never start always file name> name for stdout> name for stderr> COPY_ALL] Description Name of the job Send notification if the job is finished if the job returned an error code 0 never upon the start of the job combination of start, end, error Mail address to send messages to Requested wall time for the job Specifies corresponding file names Environment variable to be exported to job Queue job 29. November 2013 Slide 17
18 LoadLeveler Blue Gene/Q Keywords Keyword bluegene] of nodes> Xa Xb Xc Xd Description Specifies the type of job step to process. Must be set to bluegene for parallel applications. Size of the Blue Gene job, keywords bg_size and bg_shape are mutually exclusive. Specifies the requested shape of a job (in midplanes). whether the scheduler should consider all possible rotations of the given shape Type of wiring requested for the block (can be specified for each dimension separately) 29. November 2013 Slide 18
19 LoadLeveler Job Classes Class name #Nodes Max. run time Default run time n :30:00 00:30:00 n :30:00 00:30:00 n :00:00 06:00:00 n :00:00 06:00:00 m :00:00 06:00:00 m :00:00 06:00:00 m :00:00 06:00:00 m :00:00 06:00:00 m016* :00:00 06:00:00 m032* :00:00 06:00:00 m048* :00:00 06:00:00 m056* :00:00 06:00:00 *On demand only You will be charged for the full partition (e.g. if you request 513 nodes you will be charged for 1024 nodes!) Always use full partitions! 29. November 2013 Slide 19
20 LoadLeveler Job Scheduling Backfill scheduler The biggest job has the highest priority (Top Dog) LoadLeveler fills gaps with smaller, short-running jobs while freeing the system for the Top Dog Tip: Specify the wall time for your jobs as exact as possible, because jobs requesting a shorter wall time have a better chance to be executed. Big jobs Jobs requesting 8 racks are collected and run in dedicated time slots (e.g. after a maintenance) at least once a week 29. November 2013 Slide 20
21 Running Simulations runjob Command Launch command for parallel applications runjob [options] runjob [options]: <executable> [arguments] Option --args <prg_arg> --exe <executable> --envs <ENV_Var=Value> --exp-env <ENV_Var> --np <number> --ranks-per-node <number> Description Passes "prg_arg" to the launched application on the compute node. Specifies the full path to the executable Sets the environment variable ENV_Var=Value Sets the environment variable ENV_Var Total number of (MPI) tasks Number of (MPI) tasks per compute node 29. November 2013 Slide 21
22 LoadLeveler Example Job Command File I #@job_name = MPI_code #@comment = 32 ranks per node" #@output = test_$(jobid)_$(stepid).out #@error = test_$(jobid)_$(stepid).err #@environment = COPY_ALL #@job_type = bluegene #@notification = never #@bg_size = 512 #@bg_connectivity = torus #@wall_clock_limit = 14:00:00 #@queue runjob --np ranks-per-node 32 --exe app.x Pure MPI applications need to use 32 tasks per node in order use the architecture efficiently! 29. November 2013 Slide 22
23 Running Simulations MPI/OpenMP Codes On Blue Gene/P Three modes were available 1. VN mode (4 MPI tasks, no thread per task) 2. DUAL mode (2 MPI tasks with 2 OpenMP threads each) 3. SMP mode (1 MPI task with 4 OpenMP threads) On Blue Gene/Q One node has 16 cores with 4-way SMT each Several configurations possible ntasks nthreads = 64 ntasks = 2 n, 0 n 6 Test carefully, which configuration gives the best performance for your application and setup! 29. November 2013 Slide 23
24 LoadLeveler Example Job Command File II = hybrid_code = 16x4 configuration" = test_$(jobid)_$(stepid).out = test_$(jobid)_$(stepid).err = COPY_ALL = bluegene = never = 512 = torus = 14:00:00 runjob --np ranks-per-node 16\ --env OMP_NUM_THREADS=4 : app.x i input 29. November 2013 Slide 24
25 Monitoring of Jobs LoadLeveler llq [options] Llview Client-server based application compact summary of different information (e.g. current usage of system, job prediction, expected and average waiting times, ) Customizable Developed by W. Frings (JSC) 29. November 2013 Slide 25
26 Llview 29. November 2013 Slide 26
27 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 27
28 Compilers Different compilers for front-end and compute nodes GNU and IBM XL family of compilers available Tip: It is recommended to use the XL suite of compilers for the CN since they produce in general better optimized code. Language XL compiler GNU compiler C xlc, xlc_r gcc C++ xlc++, xlc++_r, xlc, xlc_r g++ Fortran xlf, xlf90, xlf95, xlf2003 xlf_r, xlf90_r, xlf95_r, xlf2003_r gfortran 29. November 2013 Slide 28
29 Compilers for CN Language Compiler invocation MPI wrapper C powerpc64-bgq-linux-gcc mpigcc C++ powerpc64-bgq-linux-g++ mpig++ Fortran powerpc64-bgq-linux-gfortran mpigfortran Tip: Language To be on the safe side, always use the corresponding MPI wrappers with _r when compiling for the CNs. Compiler invocation MPI wrapper (thread-safe: *_r) (thread-safe: *_r) C bgxlc, bgc89, bgc99 mpixlc C++ bgxlc++, bgxlc mpixlcxx Fortran bgxlf, bgxlf90, bgxlf95, bgxlf2003 mpixlf77, mpixlf90, mpixlf95, mpixlf November 2013 Slide 29
30 Basic Compiler/Linker Options XL Compilers I Flags in order of increasing optimization potential Optimization Level Description -O2 -qarch=qp -qtune=qp Basic optimization -O3 -qstrict -qarch=qp -qtune=qp More aggressive, not impact on acc. -O3 -qhot -qarch=qp -qtune=qp More aggressive, may influence acc. (high-order transformations of loops) -O4 -qarch=qp -qtune=qp Interprocedural optimization at compile time -O5 -qarch=qp -qtune=qp Interprocedural optimization at link time, whole program analysis Some flags need to be used during compilation AND linking Check the compiler manual. If you are not sure, include flags used for compiling also in the linking step! 29. November 2013 Slide 30
31 Basic Compiler/Linker Options XL Compilers II Additional compiler flags Compiler/Linker Flag -qsmp=omp -qthreaded -qreport -qlist -qessl lessl[smp]bg Description Switch on OpenMP support Generates for each source file <name> a file <name>.lst with pseudo code and a description of the kind of code optimizations which were performed Compiler attempts to replace some intrinsic FORTRAN 90 procedures by essl routines where it is safe to do so How to link with ESSL routines, see November 2013 Slide 31
32 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 32
33 Diagnostic Compiler Flags (XL Compilers) Diagnostic messages are given on the terminal and/or in a separate file -qreport: compilers generate a file name.lst for each source file -qlist: compiler listing including an object listing -qlistopt: options in effect during compilation included in listing Listen to the compiler! -qflag=<listing-severety>:<terminal-severety> i: informal messages, w: warning messages, s: severe errors Use -qflag=i:i to get all information -qlistfmt=(xml html)=<option> 29. November 2013 Slide 33
34 Example: Compilers Diagnostics subroutine mult(c,a,ndim) implicit none integer :: ndim,i,j double precision :: a(ndim),c(ndim,ndim)! Loop do i=1,1000 do j=1,1000 c(i,j) = a(i) enddo enddo end subroutine mult >>>>> LOOP TRANSFORMATION SECTION <<<<< 1 SUBROUTINE mult (c, a, ndim) [...] Id=1 DO $$CIV2 = $$CIV2, IF (.FALSE.) GOTO lab_11 $$LoopIV1 = 0 Id=2 [...] DO $$LoopIV1 = $$LoopIV1, Loop interchanging applied to loop nest Outer loop has been unrolled 8 time(s). 29. November 2013 Slide 34
35 Single-Core Optimization Compiler/Linker Flags -qessl For Fortran codes If either lessl or -lesslsmp are also specified then ESSL routines should be used in place of some Fortran90 intrinsic procedures when there is a safe opportunity to do so. -qipa (compiling and linking step) Turns on or customizes interprocedural analysis (IPA) High potential for performance benefits May considerably increase time for compiling and linking step! 29. November 2013 Slide 35
36 Single-Core Optimization Compiler/Linker Flags -qinline[=auto:level=5, +procedure1[:procedure2[: ]], ] Attempts to inline procedures instead of generating calls to those procedures, for improved performance. Several suboptions are available, check the man page. -qsimd=auto Enables the automatic generation of vector instructions -qtm Enables support for transactional memory Use only with thread-safe compiler wrappers -qunroll=yes Instructs the compiler to search for more opportunities for loop unrolling than that performed with auto. 29. November 2013 Slide 36
37 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 37
38 Quad Floating Point Extension Unit (QPX) 4 double precision pipelines, usable as: scalar FPU 4-wide FPU SIMD (Single Instruction Multiple Data) 2-wide complex arithmetic SIMD 8 concurrent floating point ops (FMA) + load + store 29. November 2013 Slide 38
39 IBM XL Compiler Support for QPX Usage of QPX Compiler flag -qsimd=auto Check that simd vectorization is actually done! -qreport -qlist >>>> LOOP TRANSFORMATION SECTION <<<< [...] Loop with nest-level 1 and iteration count 1000 was SIMD vectorized [...] >>>> LOOP TRANSFORMATION SECTION <<<< [...] Loop was not SIMD vectorized because the loop is not the innermost loop Loop was not SIMD vectorized because it contains memory references with non- vectorizable alignment. 29. November 2013 Slide 39
40 QPX Usage Hints for the Compiler Compiler needs hints Hint compiler to likely iteration counts Instruct compiler to align fields Tell that FORTRAN assumed-shape arrays are contiguous -qassert=contig real*8 :: x(:),y(:),a!ibm* align(32, x, y)!ibm* assert(itercnt(100)) do i=m, n z(i) = x(i) + a*y(i) enddo double align(32) *x, *y; double a; #pragma disjoint(*x, *y) #pragma disjoint(*x, a) #pragma ibm iterations(100) for (int i=m;i<n;i++) z[i] = x[i] + a*y[i] void foo(double* restrict a1, double* restrict a2) { for (int i=0; i<n; i++) a1[i]=a2[i]; } 29. November 2013 Slide 40
41 IBM XL QPX Intrinsics New intrinsic variable type C/C++: vector4double FORTRAN: vector(real(8)) Wide set of elemental functions available LOAD,STORE, MULT, MULT-ADD, ROUND, CEILING, SQRT, Strengths: User may layout calculation by hand, if compiler not smart enough (e.g. where no loop) Easy to use: Leave stack, register layout, load/store scheduling to compiler 29. November 2013 Slide 41
42 QPX Example using Compiler Intrinsics typedef vector4double qv; qv dx,dy,dz,dx2,dy2,dz2 for (i=0;i<4;i++) { } xd[i] = xdipl[j]; yd[i] = ydipl[j]; zd[i] = zdipl[j]; dx2 = vec_mul(dx,dx); dy2 = vec_mul(dy,dy); dz2 = vec_mul(dz,dz); d = vec_swsqrt(dx2+dy2+dz2); Source: IBM Corporation 29. November 2013 Slide 42
43 User Information and Support Information about JUQUEEN JSC websites at IBM Blue Gene/Q Application Development Redbook Dispatch and User Support Applications for accounts (for approved projects) User Support Forschunszentrum Jülich GmbH,JSC, Dispatch, Jülich Tel: , Fax: Tel: November 2013 Slide 43
44 Workshop Announcement: Second JUQUEEN Porting and Tuning Workshop 3-5 February 2014 Forschungszentrum Jülich Jülich Supercomputing Centre Contact: Dr. Dirk Brömmel, d.broemmel(at)fz-juelich.de END 29. November 2013 Slide 44
Best Practices JUGENE. Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich
Best Practices JUGENE Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich Outline Supercomputing@FZJ A brief Overview Best Practice JUGENE System
More informationBlue Gene/Q User Workshop. User Environment & Job submission
Blue Gene/Q User Workshop User Environment & Job submission Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2 Blue Joule User Accounts Home directories organised on a project
More informationIntroduction to HPC Numerical libraries on FERMI and PLX
Introduction to HPC Numerical libraries on FERMI and PLX HPC Numerical Libraries 11-12-13 March 2013 a.marani@cineca.it WELCOME!! The goal of this course is to show you how to get advantage of some of
More informationMathematical Libraries and Application Software on JUQUEEN and JURECA
Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course November 2015 I.Gutheil Outline General Informations Sequential Libraries Parallel
More informationMathematical Libraries and Application Software on JUQUEEN and JURECA
Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course May 2017 I.Gutheil Outline General Informations Sequential Libraries Parallel
More informationParallel I/O on JUQUEEN
Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O
More informationMathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN. JSC Training Course
Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN JSC Training Course May 22, 2012 Outline General Informations Sequential Libraries Parallel
More informationThe Blue Gene/P at Jülich Introduction. W.Frings, Forschungszentrum Jülich,
The Blue Gene/P at Jülich Introduction W.Frings, Forschungszentrum Jülich, 26.08.2008 Overview Introduction System overview Using the Blue Gene/P system Jugene Compiling Running a program Libraries, Overview
More informationI/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings
Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing
More informationPorting Applications to Blue Gene/P
Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationBlue Gene/P Advanced Topics
Blue Gene/P Advanced Topics Blue Gene/P Memory Advanced Compilation with IBM XL Compilers SIMD Programming Communications Frameworks Checkpoint/Restart I/O Optimization Dual FPU Architecture One Load/Store
More informationDeveloping Environment on BG/Q FERMI. Mirko Cestari
Developing Environment on BG/Q FERMI Mirko Cestari m.cestari@cineca.it USER SUPPORT superc@cineca.it WHAT THE USERS THINK OF SYS-ADMINS WHAT THE SYS-ADMINS THINK OF USERS Outline A first step Introduction
More informationEarly experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007
Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some
More informationI/O Monitoring at JSC, SIONlib & Resiliency
Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local
More informationWelcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich
Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 18 13:00-13:30 Welcome
More informationVector Float Point Unit - QPX
Vector Float Point Unit - QPX QPX (77) The computational model of the QPX architecture is a vector single instruction, multiple data (SIMD) model with four execution slots and a register file that contains
More informationLLVM and Clang on the Most Powerful Supercomputer in the World
LLVM and Clang on the Most Powerful Supercomputer in the World Hal Finkel November 7, 2012 The 2012 LLVM Developers Meeting Hal Finkel (Argonne National Laboratory) LLVM and Clang on the BG/Q November
More informationIntroduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake. Valentin Pavlov
Introduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake Valentin Pavlov About these lectures This is the fourth of series of six introductory lectures discussing
More informationHPM Hardware Performance Monitor for Bluegene/Q
HPM Hardware Performance Monitor for Bluegene/Q PRASHOBH BALASUNDARAM I-HSIN CHUNG KRIS DAVIS JOHN H MAGERLEIN The Hardware performance monitor (HPM) is a component of IBM high performance computing toolkit.
More informationOptimising with the IBM compilers
Optimising with the IBM Overview Introduction Optimisation techniques compiler flags compiler hints code modifications Optimisation topics locals and globals conditionals data types CSE divides and square
More informationIntroduction to Compilers and Optimization
Introduction to Compilers and Optimization Le Yan (lyan1@cct.lsu.edu) Scientific Computing Consultant Louisiana Optical Network Initiative / LSU HPC April 1, 2009 Goals of training Acquaint users with
More informationWelcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich
Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Thursday, Nov 26 13:00-13:30
More informationDEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes
DEBUGGING ON FERMI Debugging your application on a system based on a BG/Q architecture like FERMI could be an hard task due to the following problems: the core files generated by a crashing job on FERMI
More informationIntroduction to PICO Parallel & Production Enviroment
Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it
More informationDevelopment Environment on BG/Q FERMI. Nicola Spallanzani
Development Environment on BG/Q FERMI Nicola Spallanzani n.spallanzani@cineca.it www.hpc.cineca.it USER SUPPORT superc@cineca.it WHAT THE USERS THINK OF SYS-ADMINS WHAT THE SYS-ADMINS THINK OF USERS Outline
More informationContent. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center
Content IBM PSSC Montpellier Customer Center MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler Control System Service Node (SN) An IBM system-p 64-bit system Control
More informationParallel I/O and Portable Data Formats I/O strategies
Parallel I/O and Portable Data Formats I/O strategies Sebastian Lührs s.luehrs@fz-juelich.de Jülich Supercomputing Centre Forschungszentrum Jülich GmbH Jülich, March 13 th, 2017 Outline Common I/O strategies
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationJob Management on LONI and LSU HPC clusters
Job Management on LONI and LSU HPC clusters Le Yan HPC Consultant User Services @ LONI Outline Overview Batch queuing system Job queues on LONI clusters Basic commands The Cluster Environment Multiple
More informationMitglied der Helmholtz-Gemeinschaft. System Monitoring: LLview
Mitglied der Helmholtz-Gemeinschaft System Monitoring: LLview November 27, 2015 Carsten Karbach and Julia Valder Content 1 Overview 2 Components 3 Customization November 27, 2015 Carsten Karbach and Julia
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationMPI RUNTIMES AT JSC, NOW AND IN THE FUTURE
, NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC
More informationJülich Supercomputing Centre
Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre Norbert Attig Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich (FZJ) Aug 26, 2009 DOAG Regionaltreffen NRW 2 Supercomputing at
More informationParallel Tools Platform for Judge
Parallel Tools Platform for Judge Carsten Karbach, Forschungszentrum Jülich GmbH September 20, 2013 Abstract The Parallel Tools Platform (PTP) represents a development environment for parallel applications.
More informationIntroduction to the Power6 system
Introduction to the Power6 system Dr. John Donners john.donners@sara.nl Consultant High Performance Computing & Visualization SARA Computing & Networking Services About SARA > SARA is an independent not-for-profit
More informationIntroduction to CINECA HPC Environment
Introduction to CINECA HPC Environment 23nd Summer School on Parallel Computing 19-30 May 2014 m.cestari@cineca.it, i.baccarelli@cineca.it Goals You will learn: The basic overview of CINECA HPC systems
More informationAdvanced cluster techniques with LoadLeveler
Advanced cluster techniques with LoadLeveler How to get your jobs to the top of the queue Ciaron Linstead 10th May 2012 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job
More informationHPC with PGI and Scalasca
HPC with PGI and Scalasca Stefan Rosenberger Supervisor: Univ.-Prof. Dipl.-Ing. Dr. Gundolf Haase Institut für Mathematik und wissenschaftliches Rechnen Universität Graz May 28, 2015 Stefan Rosenberger
More informationPerformance Analysis on Blue Gene/P
Performance Analysis on Blue Gene/P Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook University From microprocessor to the full Blue Gene P/system IBM XL Compilers The commands
More informationBatch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC
Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,
More informationOur new HPC-Cluster An overview
Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization
More informationQCD Performance on Blue Gene/L
QCD Performance on Blue Gene/L Experiences with the Blue Gene/L in Jülich 18.11.06 S.Krieg NIC/ZAM 1 Blue Gene at NIC/ZAM in Jülich Overview: BGL System Compute Chip Double Hummer Network/ MPI Issues Dirac
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Alessandro Grottesi a.grottesi@cineca.it SuperComputing Applications and
More informationCompiling applications for the Cray XC
Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers
More informationIntroduction to Compilers HPC Workshop University of Kentucky May 9, 2007 May 10, 2007
Introduction to Compilers HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Andrew Komornicki, Ph. D. Balaji Veeraraghavan, Ph. D. Agenda Introduction Availability of compilers, GNU, Intel and
More informationCode optimization with the IBM XL compilers on Power architectures IBM
Code optimization with the IBM XL compilers on Power architectures IBM December 2017 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available
More informationShared Memory Programming With OpenMP Computer Lab Exercises
Shared Memory Programming With OpenMP Computer Lab Exercises Advanced Computational Science II John Burkardt Department of Scientific Computing Florida State University http://people.sc.fsu.edu/ jburkardt/presentations/fsu
More informationBlue Gene/Q A system overview
Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q A system overview M. Stephan Outline Blue Gene/Q hardware design Processor Network I/O node Jülich Blue Gene/Q configurations (JUQUEEN) Blue Gene/Q software
More informationOpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means
High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview
More informationIntroduction to GALILEO
November 27, 2016 Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it SuperComputing Applications and Innovation Department
More informationIntroduction to Supercomputing at. Kate Hedstrom, Arctic Region Supercomputing Center (ARSC) Jan, 2004
1 Introduction to Supercomputing at ARSC Kate Hedstrom, Arctic Region Supercomputing Center (ARSC) kate@arsc.edu Jan, 2004 2 Topics Introduction to Supercomputers at ARSC Computers Accounts Getting an
More informationmeinschaft May 2012 Markus Geimer
meinschaft Mitglied der Helmholtz-Gem Module setup and compiler May 2012 Markus Geimer The module Command Software which allows to easily manage different versions of a product (e.g., totalview 5.0 totalview
More informationMatrix Multiplication on Blue Gene/P User Guide
Matrix Multiplication on Blue Gene/P User Guide Maciej Cytowski, Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw 1. Introduction This document is a User Guide
More informationOverview on HPC software (compilers, libraries, tools)
Overview on HPC software (compilers, libraries, tools) July 5, 2018 Slavko Brdar 1,3, Klaus Goergen 2,3, Inge Gutheil 1, Damian Alvarez 1, Michael Knobloch 1 1 Jülich Supercomputing Centre, Research Centre
More informationScalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany
Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP
More informationBefore We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop
Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources
More informationPerformance analysis on Blue Gene/Q with
Performance analysis on Blue Gene/Q with + other tools and debugging Michael Knobloch Jülich Supercomputing Centre scalasca@fz-juelich.de July 2012 Based on slides by Brian Wylie and Markus Geimer Performance
More informationCode Optimization. Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC)
Code Optimization Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) brandon.barker@cornell.edu Workshop: High Performance Computing on Stampede January 15, 2015
More informationIntroduction to OpenMP
Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science
More informationJÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich
JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany
More informationProgramming Environment 4/11/2015
Programming Environment 4/11/2015 1 Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent interface
More informationShared Memory Programming With OpenMP Exercise Instructions
Shared Memory Programming With OpenMP Exercise Instructions John Burkardt Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Advanced Computational Science
More informationOpen Multi-Processing: Basic Course
HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationKISTI TACHYON2 SYSTEM Quick User Guide
KISTI TACHYON2 SYSTEM Quick User Guide Ver. 2.4 2017. Feb. SupercomputingCenter 1. TACHYON 2 System Overview Section Specs Model SUN Blade 6275 CPU Intel Xeon X5570 2.93GHz(Nehalem) Nodes 3,200 total Cores
More informationIntroduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah
Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Types of parallel computers. Parallel programming options. How to
More information2014 LENOVO. ALL RIGHTS RESERVED.
2014 LENOVO. ALL RIGHTS RESERVED. Parallel System description. Outline p775, p460 and dx360m4, Hardware and Software Compiler options and libraries used. WRF tunable parameters for scaling runs. nproc_x,
More informationOpenPBS Users Manual
How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00
More informationShared memory programming model OpenMP TMA4280 Introduction to Supercomputing
Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started
More informationBlue Gene/Q User Workshop. Debugging
Blue Gene/Q User Workshop Debugging Topics GDB Core Files Coreprocessor 2 GNU Debugger (GDB) The GNU Debugger (GDB) The Blue Gene/Q system includes support for running GDB with applications that run on
More informationIntroduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah
Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Types of parallel computers. Parallel programming options. How to
More informationIntroduction to GALILEO
Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it
More informationFFTSS Library Version 3.0 User s Guide
Last Modified: 31/10/07 FFTSS Library Version 3.0 User s Guide Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, is supported by the Development of Software Infrastructure for Large
More informationBeginner's Guide for UK IBM systems
Beginner's Guide for UK IBM systems This document is intended to provide some basic guidelines for those who already had certain programming knowledge with high level computer languages (e.g. Fortran,
More informationHow to compile C/C++ program on SR16000
How to compile C/C++ program on SR16000 Center for Computational Materials Science, Institute for Materials Research, Tohoku University 2014.1 version 1.0 Contents 1. Compile... 1 1.1 How to use XL C/C++...
More informationIntroduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations
Introduction Optimization options control compile time optimizations to generate an application with code that executes more quickly. Absoft Fortran 90/95 is an advanced optimizing compiler. Various optimizers
More informationAMD S X86 OPEN64 COMPILER. Michael Lai AMD
AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries
More informationUsing the IBM Opteron 1350 at OSC. October 19-20, 2010
Using the IBM Opteron 1350 at OSC October 19-20, 2010 Table of Contents Hardware Overview The Linux Operating System User Environment and Storage 2 Hardware Overview Hardware introduction Login node configuration
More informationIntroduction to OpenMP
1 Introduction to OpenMP NTNU-IT HPC Section John Floan Notur: NTNU HPC http://www.notur.no/ www.hpc.ntnu.no/ Name, title of the presentation 2 Plan for the day Introduction to OpenMP and parallel programming
More informationIntroduction to OpenMP
Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with
More informationBatch Systems & Parallel Application Launchers Running your jobs on an HPC machine
Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection
More informationHigh Performance Computing Software Development Kit For Mac OS X In Depth Product Information
High Performance Computing Software Development Kit For Mac OS X In Depth Product Information 2781 Bond Street Rochester Hills, MI 48309 U.S.A. Tel (248) 853-0095 Fax (248) 853-0108 support@absoft.com
More informationThe Cray Programming Environment. An Introduction
The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent
More informationIBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents
IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents Introduction...3 Architecture...4 simple_sched daemon...4 startd daemon...4 End-user commands...4 Personal HTC Scheduler...6
More information[Scalasca] Tool Integrations
Mitglied der Helmholtz-Gemeinschaft [Scalasca] Tool Integrations Aug 2011 Bernd Mohr CScADS Performance Tools Workshop Lake Tahoe Contents Current integration of various direct measurement tools Paraver
More informationAlgorithms and Computation in Signal Processing
Algorithms and Computation in Signal Processing special topic course 18-799B spring 2005 22 nd lecture Mar. 31, 2005 Instructor: Markus Pueschel Guest instructor: Franz Franchetti TA: Srinivas Chellappa
More informationARM High Performance Computing
ARM High Performance Computing Eric Van Hensbergen Distinguished Engineer, Director HPC Software & Large Scale Systems Research IDC HPC Users Group Meeting Austin, TX September 8, 2016 ARM 2016 An introduction
More informationIBM PSSC Montpellier Customer Center. Content
Content IBM PSSC Montpellier Customer Center Standard Tools Compiler Options GDB IBM System Blue Gene/P Specifics Core Files + addr2line Coreprocessor Supported Commercial Software TotalView Debugger Allinea
More informationAutomatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P
Automatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P Franz Franchetti 1, Yevgen Voronenko 2, Gheorghe Almasi 3 1 University and SpiralGen, Inc. 2 AccuRay, Inc., 3 IBM Research
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationIFISS is a set of test problems from [Elman, Silvester, Wathen]. Convectiondiffusion problems cd2 and cd4 are the two of interest here.
1 Objective The aim of the work was to install the Trilinos software library on the HPCx service and then to investigate its performance using test problems from the ifiss suite. Trilinos is a collection
More informationIntroduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah
Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Types of parallel computers. Parallel programming options. How to write
More informationIntroduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah
Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Types of parallel computers. Parallel programming options. How to
More informationLecture 3: Intro to parallel machines and models
Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class
More informationTECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)
TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx
More informationIntroduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines
Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi
More informationPROGRAMMING MODEL EXAMPLES
( Cray Inc 2015) PROGRAMMING MODEL EXAMPLES DEMONSTRATION EXAMPLES OF VARIOUS PROGRAMMING MODELS OVERVIEW Building an application to use multiple processors (cores, cpus, nodes) can be done in various
More information