Best Practices JUGENE. Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich

Size: px
Start display at page:

Download "Best Practices JUGENE. Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich"

Transcription

1 Best Practices JUGENE Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich

2 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 2

3 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 3

4 Forschungszentrum Jülich 4

5 Jülich Supercomputing Centre (JSC) 5

6 Tasks of the JSC Operation of the supercomputers for local, national and European scientists. User support in programming and optimisation of applications Support of research groups from different research domains by means of simulation laboratories Research & Development work on new computer architectures, algorithms, performance analysis and tools, GRID computing, etc. Education and training of users, education of students (bachelor and master courses, PhD programmes) 6

7 Integration of JSC in Networks and Alliances Jülich Aachen Research Alliance (JARA) Initiative of RWTH Aachen and Forschungszentrum Jülich Gauss Centre for Supercomputing (GCS) Alliance of 3 German centres Höchstleistungsrechenzentrum Stuttgart (HLRS) Jülich Supercomputing Centre (JSC) Leibnitz-Rechenzentrum (LRZ) German representative in PRACE John von Neumann Institute for Computing (NIC) Joint foundation of 3 Helmholtz Centres Deutsches Elektronensynchrotron (DESY) Gesellschaft f. Schwerionenforschung (GSI) Jülich Supercomputing Centre (JSC) 7

8 2004 Supercomputer Systems: Dual Concept IBM Power 4+ JUMP, 9 TFlop/s 2005/6 IBM Blue Gene/L JUBL, 45 TFlop/s 2007/8 IBM Power 6 JUMP, 9 TFlop/s File Server GPFS IBM Blue Gene/P JUGENE, 223 TFlop/s 2009 Intel Nehalem Clusters JUROPA 200 TFlop/s HPC-FF 100 TFlop/s File Server GPFS, Lustre IBM Blue Gene/P JUGENE, 1 PFlop/s General-Purpose Highly-Scalable 8

9 User Information and Support Information about JUGENE JSC websites: PRACE JUGENE Best Practice Guide: Practice-Guides?lang=en Dispatch and User Support Applications for accounts Forschunszentrum Jülich GmbH,JSC, Dispatch, Jülich Tel: , Fax: User Support Tel:

10 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 10

11 Blue Gene/P Design Goals Optimal performance / watt ratio Transparent high-speed reliable network Easy programming based on standard message passing interface (MPI) Extreme scalability ( > 512K cores) High reliability 11

12 Blue Gene/P Processor and Chip Microprocessor PowerPC bit microprocessor 850 MHz Double-precision floating-point multiply-add unit (double FPU) 3.4 GFLOP/s Chip Contains 4 cores Peak performance of 13.6 GFLOP/s 12

13 Blue Gene/P Compute Card (Node) Blue Gene/P compute ASIC 4 cores, 8MB cache Cu heatsink SDRAM DDR2 2 or 4 GB memory Node card connector network, power 13

14 Blue Gene/P Node (schematically) 14

15 Blue Gene/P Node Card 32 Compute cards Up to 2 I/O nodes Air-cooled 15

16 JUGENE Building Blocks Midplane 16 Node Cards Cabled 8x8x8 ~7 TF/s, 1 TB Node Card (32 chips 4x4x2) 32 compute, 0-1 IO cards 435 GF/s, 64 GB Rack 32 Node Cards Cabled 8x8x TF/s, 2 TB System 72 Racks (9x8) 1 PF/s (825.5 TF/s LINPACK), 144 TB Largest TORUS: 9x4x4 Chip 4 processors 13.6 GF/s Compute Card 1 chip, 13.6 GF/s 2 GB DDR2 16

17 Different Blue Gene/P Nodes Overview Login Nodes (3) IBM p6 550 processor (8 cores each node) 4.2 GHz 128 GB RAM SuSE Linux (SLES 10) Internet address: jugene.fz-juelich.de Compute Nodes (73,728) PowerPC bit microprocessor (4 cores each node) 850 MHz 2 GB RAM Compute Node Kernel (CNK, a light-weight kernel) Only accessible via LoadLeveler 17

18 Different Blue Gene/P Nodes Overview Service Nodes (2) IBM p6 550 processor (8 cores each node) 4.2 GHz 128 GB RAM SuSE Linux (SLES 10) Not accessible for users I/O Nodes (600) Hardware similar to the compute nodes Responsible for handling I/O No login possible for users 18

19 Blue Gene/P Networks 3-Dimentional Torus Communication Backbone for applications Bandwidth: 425 MB/s per link (5.1 GB/s bidirectional per node) Usable for jobsizes 512 nodes (1 midplane) Collective Network (Global Tree) One-to-all broadcast functionality Reduction operations functionality 850 MB/s per link Interconnects all compute and I/O nodes Other Networks Global Barrier and Interrupt network External I/O network (10 GE, external communication) Control network (1 GE, service/support purposes) 19

20 JUGENE I/O Subsystem I/O exclusively via I/O nodes Distribution of the 600 I/O nodes 71 racks equipped with 4 I/O nodes per midplane (1 per 128 CN) 1 rack equipped with 16 I/O nodes per midplane (1 per 32 CN) Connected to a GPFS (General Parallel File System, IBM) 28 IBM p6 servers Around 6000 disks 20

21 JUGENE File System Classes Home ($HOME) Accessible from FEN and CN Backup Group limits: 6 TB, 2 mio files Bandwidth: ~8.5 GB/s (using 4 midplanes) Scratch ($WORK) Accessible from FEN and CN No Backup Group limits: 20 TB, 4 mio files Bandwidth: ~33 GB/s (using 16 midplanes) Caution: Files older than 90 days on $WORK will be deleted without warning! 21

22 JUGENE File System Classes Archive ($ARCH) Accessible from FEN only Use for storage of large files (only for the lifetime of your project!) Group limits: no limits, however if you need more then 20TB please contact us Bandwidth: ~2.8 GB/s Important: Use large files only (use tar as container for many small files): Retrieving 1 file from tape takes 2 minutes + time for transfer Do NOT rename files in $ARCH 22

23 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 23

24 Module Environment Access to software via the module command module <options> Option Description <no option> avail list load <module> help <module> Lists available options Lists all available modules Lists modules currently loaded Load <module> Lists information about <module> show <module> Module categories Information about settings done by the module <module> COMPILER, IO, MATH, MISC, SCIENTIFIC, TOOLS 24

25 Software Software for front-end and compute nodes Front-end nodes: /usr/local Compute nodes : /bgsys/local Installed software packages (selection) CPMD, GPAW, Gromacs, LAMMPS, Namd BLACS, BLAS, FFTW, LAPACK, PETSc, ScaLAPACK, SPRNG DDT, HDF5, netcdf, Scalasca, SIONlib, Tau, Totalview, Vampir 25

26 Running Simulations Batch System Execution of applications managed by LoadLeveler Command llsubmit <jobfile> llq llcancel <job ID> llstatus llclass llqx Description Sends job to the queuing system Lists all queued and running jobs Kills the specified job Displays the status of LoadLeveler Lists existing classes and their properties Shows detailed information about all jobs 26

27 Running Simulations Job command files Two blocks of instructions 1. LoadLeveler keyword block Keywords start with Determines the job configuration (number of nodes, wall time, network, etc.) 2. Applcation block Regular shell script block Can contain any shell command and call(s) to applications 27

28 Running Simulations mpirun command Launch command for parallel applications mpirun <options> Option -args prg_args -env var=value -env_all -exe <executable> -mapfile -mode -verbose Description Passes prg_args to the launched application on the CN Sets the environment variable var to value Exports all environment variables to the job on the CN Specifies the executable Specifies the process mapping (default: XYZT) Specifies the execution mode (default: SMP) Specifies the verbosity level 28

29 Running Simulations Execution Mode I SMP (Symmetrical MultiProcessing) Default mode Each CN executes 1 (MPI) process Up to 4 (OpenMP/POSIX) threads per process Maximum amount of memory per process (2 GB) Shared memory between all threads 29

30 Running Simulations Execution Mode II DUAL Each CN executes 2 (MPI) processes Up to 2 (OpenMP/POSIX) threads per process 1 GB per process available Shared memory between threads of the same (MPI) process 30

31 Running Simulations Execution Mode III VN (Virtual Node) Each CN executes 4 (MPI) processes Intended for MPI-only parallelization scheme 0.5 GB per process available Shared memory between threads of the same (MPI) process Caution: Pure MPI applications must use this mode on JUGENE 31

32 Running Simulations Job example I Pure MPI applications #!/bin/bash #@job_name = example_1 #@comment = Pure MPI application" #@error = $(job_name).$(jobid).err #@output = $(job_name).$(jobid).out #@environment = COPY_ALL #@wall_clock_limit = 24:00:00 #@notification = error #@notify_user = my_address@my.institution #@job_type = bluegene #@bg_size = 2048 #@bg_connection = TORUS #@queue mpirun -mode VN -mapfile TXYZ -verbose 2 -exe application.x 32

33 Running Simulations Job example II Hybrid MPI/OpenMP applications #!/bin/bash = example_2 = MPI/OpenMP application" = $(job_name).$(jobid).err = $(job_name).$(jobid).out = COPY_ALL = 12:00:00 = complete = #@job_type = bluegene #@bg_size = 1024 #@bg_connection = TORUS #@queue mpirun -verbose 2 env OMP_NUM_THREADS=4 -exe application.x 33

34 Interactive Access to CN Direct login to compute nodes not possible Job under interactive control with llrun Runs under LoadLeveler Up to 30 minutes in general At most 256 nodes Invocation: llrun [llrun_options] [mpirun_options] executable 34

35 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 35

36 CompilersI Different compilers for front-end and compute nodes GNU and IBM XL family of compilers available Tip: It is recommended to use the XL suite of compilers since they produce in general better optimized code. Language XL Compiler invocation (CN code) C bgxlc, bgc89, bgc99 mpixlc C++ bgxlc++, bgxlc mpixlcxx XL MPI wrapper (thread-safe: *_r) Fortran bgxlf, bgxlf90, bgxlf95, bgxlf2003 mpixlf77, mpixlf90, mpixlf95, mpixlf

37 Compilers II Language GNU Compiler invocation (CN code) C powerpc-bgp-linux-cc mpicc C++ powerpc-bgp-linux-g++ mpicxx Compilers for front-end node GNU MPI wrapper (thread-safe: *_r) Fortran powerpc-bgp-linux-gfortran Mpif77, mpif90 Language XL Compiler Family GNU Compilers C xlc gcc C++ xlc++, xlc g++ Fortran xlf, xlf90, xlf95, xlf2003 gfortran 37

38 Compiler Flags (XL Compilers) Flags in order of increasing optimization potential Optimization Level Description -O2 qarch=450 qtune=450 Basic optimization -O3 qstrict qarch=450 qtune=450 More aggressive, not impact on acc. -O3 qhot qarch=450 qtune=450 More aggressive, may influence acc. (high-order transformations of loops) -O4 -qarch=450 qtune=450 Interprocedural optimization at compile time -O5 -qarch=450 qtune=450 Interprocedural optimization at link time, whole program analysis 38

39 ESSL Engineering and Scientific Subroutine Library (IBM) High-performance mathematical libraries Tuned for Blue Gene/P Can be called from C, C++, and Fortran Multi-threaded version (SMP) available Highly recommended if you use BLAS or LAPACK routines mpixlf90_r o prog.x prog.f L/bgsys/local/lib lesslbg mpixlg90_r o prog.x prog.f L$(LAPACK_LIB) llapack \ L/bgsys/local/lib lesslbg Caution: Pay attention to the order of libraries when linking. The last symbol found by the linker will be used. 39

40 MPI Message-passing Interface Based on MPICH2 implementation MPI 2.1 Standard, BUT Spawning of tasks is not supported Asynchronous (non-blocking) I/O (e.g. MPI_File_iwrite, MPI_File_iread) is not supported Compiling applications: MPI wrappers see compilers Running applications: a. Using mpirun in batch jobs ( LoadLeveler, mpirun) b. Using llrun for interactive jobs ( llrun) 40

41 OpenMP Open Multi-Processing XL compiler family Full OpenMP 2.5 standard is supported Compiler flag: -qsmp=omp Use thread-safe compiler version (e.g. mpif90_r, mpixlc_r, etc.) GNU compiler family Default GNU compiler (4.1.2) does not support OpenMP, use version (module load gcc/4.3.2) Full OpenMP 2.5 standard is supported Compiler flag: -fopenmp Execution Modes: SMP (4 threads per task) DUAL (2 threads per task 41

42 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 42

43 Diagnostic Compiler Flags (XL Compilers) Diagnostic messages are given on the terminal and/or in a separate file -qreport: compilers generate a file name.lst for each source file -qlist: compiler listing including an object listing -qlistopt: options in effect during compilation included in listing Listen to the compiler -qflag=<listing-severety>:<terminal-severety> i: informal messages, w: warning messages, s: severe errors Use -qflag=i:i to get all information -qxflag=diagnostic 43

44 Example 1: Compilers Diagnostics subroutine mult(c,a,ndim) implicit none integer :: ndim,i,j double precision :: a(ndim),c(ndim,ndim)! Loop do i=1,1000 do j=1,1000 enddo enddo c(i,j) = a(i) end subroutine mult >>>>> LOOP TRANSFORMATION SECTION <<<<< 1 SUBROUTINE mult (c, a, ndim) [...] 10 IF (.FALSE.) GOTO lab_9 [...] $.CIV2 = 0 Id=1 DO $.CIV2 = $.CIV2, Id=2 DO $.LoopIV1 = $.LoopIV1, 999 [...] Loop interchanging applied to loop nest Outer loop has been unrolled 2 time(s) 44

45 Single-Core Optimization Compiler Flags -On (n=0,1,2,3,4,5) Activates different optimization options Check results! -qfloat=<suboption> Performance gain, but may result in IEEE 754 non-compliant code [no]maf combined multiple-add instructions [no]rrm rounding-to nearest mode -qipa=<suboptions> inline=<suboptions> procedure1[,procedure2[, ]] inline specified routines level=[0 1 2] level of inter procedural analysis 45

46 Single-Core Optimization Double FPU -qarch=450d Activates SIMD (Single Instruction Multiple Data) mode (vectorization) Needs -qhot=simd (only valid with -qarch=450d) Compare performance with -qarch=450 Data needs to be 16-byte aligned Check if compiler correctly applied SIMD (-qreport -qlist) *.lst file: [...] fpmadd FPMADD fp3,fp35=fp2,fp34,\ fp1,fp33,fp0,fp32,\ fcr [...] 46

47 Single-Core Optimization Double FPU Help the compiler with explicit alignment assertions (XL compilers): alignx(16,a) call alignx(16,a(1)) Avoid non-stride data access, loops with calls to functions/subroutines, many-branch or conditional instructions Use ESSL wherever possible 47

48 Example 2: Matrix Multiplication program mat_mul Integer, parameter :: n = 1024 Real*8 :: a(n,n), b(n,n), c(n,n) Integer :: i, j [...] Do i = 1,n Do j = 1,n c(i,j) = sum(a(i,:)*b(:,j)) End Do End Do [...] 1. Basic optimization -O2 -qarch=450 -qtune= ms 2. More aggressive optimization -O3 -qarch=450 -qtune= ms 3. Automatic SIMD -O3 -qarch=450d -qtune=450 -qsrtict ms 48

49 Example 2: Matrix Multiplication [...] Do j = 1,n Do k = 1,n Do i = 1,n c(i,j) = c(i,j) & & + a(i,k)*b(k,j) End Do End Do End Do [...] [...] call dgemm(transa,transa,n,n,n, alf,a,n,b,n,alf,d,n) [...] 4. Cache-aware code -O3 -qarch=450d -qtune=450 -qsrtict 3.5 ms 5. Use BLAS from ESSL -O3 -qarch=450d -qtune=450 -L/bgsys/local/lib -lesslbg 0.8 ms Performance gain: about a factor of 90! 49

50 Advanced MPI - MPI Extensions for BGP Extensions to the MPI standard to ease use of BGP hardware Only C/C++ interfaces available Include header: #include <mpix.h> Examples: int MPIX_Cart_comm_create(MPI_Comm *cart_comm) creates a four-dimensional Cartesian communicator that mimics the exact hardware on which it is run. int MPIX_Pset_same_comm_create(MPI_Comm *pset_comm) creates a communicator so that all nodes in the same communicator are served by the same I/O node. See the IBM RedBook: IBM System Blue Gene Solution: Application Development for a full list of available MPI extensions. 50

51 Advanced MPI - Environment Variables BGLMPIO_COMM=0 1 How data is exchanged on collective I/O: 0 MPI_Alltoallv (default) 1 MPI_Isend/MPI_Irecv DCMF_ALLTOALL_PREMALLOC=Y N Pre-allocation of arrays (6) for all-to-all communication DCMF_EAGER=<message size> Message size above which the rendezvous protocoll should be used (high bandwidth but initial handshake), default: 1200 bytes Short messages reduce (even down to 0 bytes) DCMF_INTERRUPT=0 1 Set 1 if your code overlaps communication and computation 51

52 Tuning the Runtime Environment Network Use the TORUS network User job sizes of 2 n *512 nodes Shape Extension of a partition in X, Y, and Z direction Optimal shape is application dependent A compact shape in general beneficial Example: Communicator 8x8x32 1x1x2 might be better than 2x1x1 [...] #@bg_shape = 1x1x2 #@bg_rotate = FALSE [...] 52

53 Tuning the Runtime Environment Mapping of processes on the TORUS Default mapping: XYZT (X varies fastest) X,Y, and Z TORUS coordinates T core coordinate within one node (T=0,1,2,3) Specify mapping with mpirun option -mapfile: mpirun -mapfile <mapping> -exe <application> <mapping> can be any permutation of X,Y,Z, and T or the name of an ASCII file Tip: It is recommended to test at least the option -mapfile TXYZ 53

54 Time (s) File I/O Task-local Files Example: Creating files in parallel in the same directory 2000 parallel create of task-local files > 33 minutes 2004, > 11 minutes 1000 > 3 minutes 676, ,7 3,2 4,0 5,5 18,2 76,2 182,3 0 1k 2k 4k 8k 16k 32k 64k 128k 256k Jugene, IBM Blue Gene/P, GPFS, filesystem /work using fopen() # of Files Caution: Creation of 256k files costs core hours! Task-local files are not an option on Blue Gene! 54

55 File I/O Guidelines Use the correct file system $WORK has the best performance use it! Pay attention to block alignment Best performance if the I/O is done in chunks with block size of the file system $WORK: 4 MB block size stat() or fstat() can be used to get block size Distribute I/O over I/O nodes Try to use as many I/O nodes as possible ( MPIX routines) Use parallel I/O (libraries) MPI I/O, ADIOS, HDF5, netcdf, SIONlib 55

56 SIONlib An Alternative to Task-Local Files SIONlib (Scalable I/O library for parallel access to tasklocal files) Collective I/O to binary shared files Logical task-local view to data Collective open/close, independent write/read Support for MPI, OpenMP, MPI+OpenMP C, C++, and Fortran-wrapper Tools for manipulation of files available (extraction of data, splitting of files, dumping meta data, etc) 56

57 SIONlib An Alternative to Task-Local Files /* Open */ sprintf(tmpfn, "%s.%06d",filename,my_nr); fileptr=fopen(tmpfn, "bw",...); [...] /* Write */ fwrite(bindata,1,nbytes,filept r); [...] /* Close */ fclose(fileptr); /* Collective Open */ nfiles=1;chunksize=nbytes; sid=sion_paropen_mpi( filename, "bw", &nfiles, &chunksize, MPI_COMM_WORLD, &lcomm, &fileptr,...); [...] /* Write */ sion_fwrite(bindata,1,nbytes,s id); [...] /* Collective Close */ sion_parclose_mpi(sid); 57

58 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 58

59 Performance Analysis Tools Accessible via module command: module load UNITE <tool> Available tools (selection): gprof (GNU profiler) no module marmot (MPI checker) marmot/2.4.0 papi (hardware counter) papi/4.0.0 Scalasca scalasca/1.4.1 tau tau/ vampir vampir/7.5-fe Basic help, first steps and references: module help <tool> 59

60 Outline A brief Overview Best Practice JUGENE System Architecture and Configuration Production Environment Basic Porting Tuning Applications Performance Analysis Debugging 60

61 Debugging Compiler Options Compilers give hints Turn optimization off -O0 -g Switch on compiler compile-time and runtime checks (floating point exceptions, boundary checks, pointer checks, etc.) -qcheck[=<suboptions>] (C/C++) -qformat[=<suboptions>] (C/C++) -qinfo[=[<suboptions>][,<goups_list>]] (C/C++) -C (Fortran) -qsigtrap[=<trap_handler>] (Fortran) -qinit=f90ptr (Fortran) -qflttrap[=<suboptions>] 61

62 Using Debuggers Prepare executable: -O0 -g -qfullpath Available debuggers DDT Totalview Important To use tools with a GUI you need to have an X server running on you local machine to connect to JUGENE with ssh -X 62

63 Example: Totalview debugger Load the corresponding modules module load UNITE totalview Start application under Totalview control llrun -np <ntasks> -mode VN -tv application.x 63

64 Navigation Source code window Action points Process and thread view

65 Debugging Analyzing Core Dump Files Activate generation of core dump files mpirun env BG_COREDUMPDISABLED=0 [ ] Core dump files are plain ASCII files including information in hexadecimal Generate your executable with the -g flag Analyzing core dump files 1. addr2line addr2line e application.x <hexaddr> 2. Totalview 3. DDT 65

66 References and Further Information Carlos Sosa and Brant Knudson. IBM System Blue Gene Solution: Blue Gene/P Application Development. 4th edition. 11 September Copyright 2007, 2009 International Business Machines Corporation. JSC websites: PRACE JUGENE Best Practice Guide: Guides?lang=en SIONlib [ XL Compilers for Blue Gene [ index.jsp?topic=/com.ibm.bg9111.doc/bgusing/related.htm] Using the IBM XL Compilers for Blue Gene [ IBM ESSL documentation [ uster.essl.doc/esslbooks.html] 66

67 Thank You! JSC User and application support: Tel:

Mitglied der Helmholtz-Gemeinschaft JUQUEEN. Best Practices. Florian Janetzko. 29. November 2013

Mitglied der Helmholtz-Gemeinschaft JUQUEEN. Best Practices. Florian Janetzko. 29. November 2013 Mitglied der Helmholtz-Gemeinschaft JUQUEEN Best Practices 29. November 2013 Florian Janetzko Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker

More information

Porting Applications to Blue Gene/P

Porting Applications to Blue Gene/P Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P

More information

Parallel I/O on JUQUEEN

Parallel I/O on JUQUEEN Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O

More information

The Blue Gene/P at Jülich Introduction. W.Frings, Forschungszentrum Jülich,

The Blue Gene/P at Jülich Introduction. W.Frings, Forschungszentrum Jülich, The Blue Gene/P at Jülich Introduction W.Frings, Forschungszentrum Jülich, 26.08.2008 Overview Introduction System overview Using the Blue Gene/P system Jugene Compiling Running a program Libraries, Overview

More information

Introduction to HPC Numerical libraries on FERMI and PLX

Introduction to HPC Numerical libraries on FERMI and PLX Introduction to HPC Numerical libraries on FERMI and PLX HPC Numerical Libraries 11-12-13 March 2013 a.marani@cineca.it WELCOME!! The goal of this course is to show you how to get advantage of some of

More information

Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007

Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some

More information

Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN. JSC Training Course

Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN. JSC Training Course Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN JSC Training Course May 22, 2012 Outline General Informations Sequential Libraries Parallel

More information

I/O Monitoring at JSC, SIONlib & Resiliency

I/O Monitoring at JSC, SIONlib & Resiliency Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 18 13:00-13:30 Welcome

More information

Jülich Supercomputing Centre

Jülich Supercomputing Centre Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre Norbert Attig Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich (FZJ) Aug 26, 2009 DOAG Regionaltreffen NRW 2 Supercomputing at

More information

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Thursday, Nov 26 13:00-13:30

More information

Stockholm Brain Institute Blue Gene/L

Stockholm Brain Institute Blue Gene/L Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall

More information

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE , NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC

More information

Blue Gene/Q User Workshop. User Environment & Job submission

Blue Gene/Q User Workshop. User Environment & Job submission Blue Gene/Q User Workshop User Environment & Job submission Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2 Blue Joule User Accounts Home directories organised on a project

More information

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Mathematical Libraries and Application Software on JUQUEEN and JURECA Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course May 2017 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Mathematical Libraries and Application Software on JUQUEEN and JURECA Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course November 2015 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

Performance Analysis on Blue Gene/P

Performance Analysis on Blue Gene/P Performance Analysis on Blue Gene/P Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook University From microprocessor to the full Blue Gene P/system IBM XL Compilers The commands

More information

Parallel I/O and Portable Data Formats I/O strategies

Parallel I/O and Portable Data Formats I/O strategies Parallel I/O and Portable Data Formats I/O strategies Sebastian Lührs s.luehrs@fz-juelich.de Jülich Supercomputing Centre Forschungszentrum Jülich GmbH Jülich, March 13 th, 2017 Outline Common I/O strategies

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

Introduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake. Valentin Pavlov

Introduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake. Valentin Pavlov Introduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake Valentin Pavlov About these lectures This is the fourth of series of six introductory lectures discussing

More information

IBM PSSC Montpellier Customer Center. Content

IBM PSSC Montpellier Customer Center. Content Content IBM PSSC Montpellier Customer Center Standard Tools Compiler Options GDB IBM System Blue Gene/P Specifics Core Files + addr2line Coreprocessor Supported Commercial Software TotalView Debugger Allinea

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

Beginner's Guide for UK IBM systems

Beginner's Guide for UK IBM systems Beginner's Guide for UK IBM systems This document is intended to provide some basic guidelines for those who already had certain programming knowledge with high level computer languages (e.g. Fortran,

More information

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center Content IBM PSSC Montpellier Customer Center MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler Control System Service Node (SN) An IBM system-p 64-bit system Control

More information

JURECA Tuning for the platform

JURECA Tuning for the platform JURECA Tuning for the platform Usage of ParaStation MPI 2017-11-23 Outline ParaStation MPI Compiling your program Running your program Tuning parameters Resources 2 ParaStation MPI Based on MPICH (3.2)

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

NVIDIA Application Lab at Jülich

NVIDIA Application Lab at Jülich Mitglied der Helmholtz- Gemeinschaft NVIDIA Application Lab at Jülich Dirk Pleiter Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich at a Glance (status 2010) Budget: 450 mio Euro Staff: 4,800

More information

Introduction to CINECA HPC Environment

Introduction to CINECA HPC Environment Introduction to CINECA HPC Environment 23nd Summer School on Parallel Computing 19-30 May 2014 m.cestari@cineca.it, i.baccarelli@cineca.it Goals You will learn: The basic overview of CINECA HPC systems

More information

Portable Parallel I/O SIONlib

Portable Parallel I/O SIONlib Mitglied der Helmholtz-Gemeinschaft Portable Parallel I/O SIONlib March 15, 2013 Wolfgang Frings, Florian Janetzko, Michael Stephan Outline Introduction Motivation SIONlib in a NutShell SIONlib file format

More information

Introduction to PICO Parallel & Production Enviroment

Introduction to PICO Parallel & Production Enviroment Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it

More information

Development Environment on BG/Q FERMI. Nicola Spallanzani

Development Environment on BG/Q FERMI. Nicola Spallanzani Development Environment on BG/Q FERMI Nicola Spallanzani n.spallanzani@cineca.it www.hpc.cineca.it USER SUPPORT superc@cineca.it WHAT THE USERS THINK OF SYS-ADMINS WHAT THE SYS-ADMINS THINK OF USERS Outline

More information

Matrix Multiplication on Blue Gene/P User Guide

Matrix Multiplication on Blue Gene/P User Guide Matrix Multiplication on Blue Gene/P User Guide Maciej Cytowski, Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw 1. Introduction This document is a User Guide

More information

Blue Gene/P Advanced Topics

Blue Gene/P Advanced Topics Blue Gene/P Advanced Topics Blue Gene/P Memory Advanced Compilation with IBM XL Compilers SIMD Programming Communications Frameworks Checkpoint/Restart I/O Optimization Dual FPU Architecture One Load/Store

More information

Blue Gene/Q A system overview

Blue Gene/Q A system overview Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q A system overview M. Stephan Outline Blue Gene/Q hardware design Processor Network I/O node Jülich Blue Gene/Q configurations (JUQUEEN) Blue Gene/Q software

More information

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016 Analyzing the High Performance Parallel I/O on LRZ HPC systems Sandra Méndez. HPC Group, LRZ. June 23, 2016 Outline SuperMUC supercomputer User Projects Monitoring Tool I/O Software Stack I/O Analysis

More information

Blue Gene/P Universal Performance Counters

Blue Gene/P Universal Performance Counters Blue Gene/P Universal Performance Counters Bob Walkup (walkup@us.ibm.com) 256 counters, 64 bits each; hardware unit on the BG/P chip 72 counters are in the clock-x1 domain (ppc450 core: fpu, fp load/store,

More information

QCD Performance on Blue Gene/L

QCD Performance on Blue Gene/L QCD Performance on Blue Gene/L Experiences with the Blue Gene/L in Jülich 18.11.06 S.Krieg NIC/ZAM 1 Blue Gene at NIC/ZAM in Jülich Overview: BGL System Compute Chip Double Hummer Network/ MPI Issues Dirac

More information

User Training Cray XC40 IITM, Pune

User Training Cray XC40 IITM, Pune User Training Cray XC40 IITM, Pune Sudhakar Yerneni, Raviteja K, Nachiket Manapragada, etc. 1 Cray XC40 Architecture & Packaging 3 Cray XC Series Building Blocks XC40 System Compute Blade 4 Compute Nodes

More information

Introduction to GALILEO

Introduction to GALILEO November 27, 2016 Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it SuperComputing Applications and Innovation Department

More information

Developing Environment on BG/Q FERMI. Mirko Cestari

Developing Environment on BG/Q FERMI. Mirko Cestari Developing Environment on BG/Q FERMI Mirko Cestari m.cestari@cineca.it USER SUPPORT superc@cineca.it WHAT THE USERS THINK OF SYS-ADMINS WHAT THE SYS-ADMINS THINK OF USERS Outline A first step Introduction

More information

Performance analysis of Sweep3D on Blue Gene/P with Scalasca

Performance analysis of Sweep3D on Blue Gene/P with Scalasca Mitglied der Helmholtz-Gemeinschaft Performance analysis of Sweep3D on Blue Gene/P with Scalasca 2010-04-23 Brian J. N. Wylie, David Böhme, Bernd Mohr, Zoltán Szebenyi & Felix Wolf Jülich Supercomputing

More information

High Performance Computing at the Jülich Supercomputing Center

High Performance Computing at the Jülich Supercomputing Center Mitglied der Helmholtz-Gemeinschaft High Performance Computing at the Jülich Supercomputing Center Jutta Docter Institute for Advanced Simulation (IAS) Jülich Supercomputing Centre (JSC) Overview Jülich

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Alessandro Grottesi a.grottesi@cineca.it SuperComputing Applications and

More information

Compiling applications for the Cray XC

Compiling applications for the Cray XC Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers

More information

meinschaft May 2012 Markus Geimer

meinschaft May 2012 Markus Geimer meinschaft Mitglied der Helmholtz-Gem Module setup and compiler May 2012 Markus Geimer The module Command Software which allows to easily manage different versions of a product (e.g., totalview 5.0 totalview

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

Cray RS Programming Environment

Cray RS Programming Environment Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

IBM PSSC Montpellier Customer Center. Content. Programming Models Shared Memory Message Passing IBM Corporation

IBM PSSC Montpellier Customer Center. Content. Programming Models Shared Memory Message Passing IBM Corporation Content IBM PSSC Montpellier Customer Center Programming Models Shared Memory Message Passing Programming Models & Development Environment Familiar Aspects SPMD model - Fortran, C, C++ with MPI (MPI1 +

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Cerebro Quick Start Guide

Cerebro Quick Start Guide Cerebro Quick Start Guide Overview of the system Cerebro consists of a total of 64 Ivy Bridge processors E5-4650 v2 with 10 cores each, 14 TB of memory and 24 TB of local disk. Table 1 shows the hardware

More information

Parallel Debugging with TotalView BSC-CNS

Parallel Debugging with TotalView BSC-CNS Parallel Debugging with TotalView BSC-CNS AGENDA What debugging means? Debugging Tools in the RES Allinea DDT as alternative (RogueWave Software) What is TotalView Compiling Your Program Starting totalview

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection

More information

Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies

Scalable Debugging with TotalView on Blue Gene. John DelSignore, CTO TotalView Technologies Scalable Debugging with TotalView on Blue Gene John DelSignore, CTO TotalView Technologies Agenda TotalView on Blue Gene A little history Current status Recent TotalView improvements ReplayEngine (reverse

More information

Portable Parallel I/O SIONlib

Portable Parallel I/O SIONlib Mitglied der Helmholtz-Gemeinschaft Portable Parallel I/O SIONlib May 26, 2014 Wolfgang Frings, Florian Janetzko, Michael Stephan Outline Introduction Motivation SIONlib in a Nutshell SIONlib file format

More information

SCALABLE HYBRID PROTOTYPE

SCALABLE HYBRID PROTOTYPE SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform

More information

LLVM and Clang on the Most Powerful Supercomputer in the World

LLVM and Clang on the Most Powerful Supercomputer in the World LLVM and Clang on the Most Powerful Supercomputer in the World Hal Finkel November 7, 2012 The 2012 LLVM Developers Meeting Hal Finkel (Argonne National Laboratory) LLVM and Clang on the BG/Q November

More information

A configurable binary instrumenter

A configurable binary instrumenter Mitglied der Helmholtz-Gemeinschaft A configurable binary instrumenter making use of heuristics to select relevant instrumentation points 12. April 2010 Jan Mussler j.mussler@fz-juelich.de Presentation

More information

Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING

Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING Rechenzentrum HIGH PERFORMANCE SCIENTIFIC COMPUTING Contents Scientifi c Supercomputing Center Karlsruhe (SSCK)... 4 Consultation and Support... 5 HP XC 6000 Cluster at the SSC Karlsruhe... 6 Architecture

More information

Performance Tools for Technical Computing

Performance Tools for Technical Computing Christian Terboven terboven@rz.rwth-aachen.de Center for Computing and Communication RWTH Aachen University Intel Software Conference 2010 April 13th, Barcelona, Spain Agenda o Motivation and Methodology

More information

I/O and Scheduling aspects in DEEP-EST

I/O and Scheduling aspects in DEEP-EST I/O and Scheduling aspects in DEEP-EST Norbert Eicker Jülich Supercomputing Centre & University of Wuppertal The research leading to these results has received funding from the European Community's Seventh

More information

No Time to Read This Book?

No Time to Read This Book? Chapter 1 No Time to Read This Book? We know what it feels like to be under pressure. Try out a few quick and proven optimization stunts described below. They may provide a good enough performance gain

More information

Cluster Network Products

Cluster Network Products Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster

More information

Optimising with the IBM compilers

Optimising with the IBM compilers Optimising with the IBM Overview Introduction Optimisation techniques compiler flags compiler hints code modifications Optimisation topics locals and globals conditionals data types CSE divides and square

More information

Optimization of MPI Applications Rolf Rabenseifner

Optimization of MPI Applications Rolf Rabenseifner Optimization of MPI Applications Rolf Rabenseifner University of Stuttgart High-Performance Computing-Center Stuttgart (HLRS) www.hlrs.de Optimization of MPI Applications Slide 1 Optimization and Standardization

More information

Porting Scientific Applications to OpenPOWER

Porting Scientific Applications to OpenPOWER Porting Scientific Applications to OpenPOWER Dirk Pleiter Forschungszentrum Jülich / JSC #OpenPOWERSummit Join the conversation at #OpenPOWERSummit 1 JSC s HPC Strategy IBM Power 6 JUMP, 9 TFlop/s Intel

More information

Recent Developments in Supercomputing

Recent Developments in Supercomputing John von Neumann Institute for Computing Recent Developments in Supercomputing Th. Lippert published in NIC Symposium 2008, G. Münster, D. Wolf, M. Kremer (Editors), John von Neumann Institute for Computing,

More information

Cluster Clonetroop: HowTo 2014

Cluster Clonetroop: HowTo 2014 2014/02/25 16:53 1/13 Cluster Clonetroop: HowTo 2014 Cluster Clonetroop: HowTo 2014 This section contains information about how to access, compile and execute jobs on Clonetroop, Laboratori de Càlcul Numeric's

More information

Introduction of Fujitsu s next-generation supercomputer

Introduction of Fujitsu s next-generation supercomputer Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of

More information

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes DEBUGGING ON FERMI Debugging your application on a system based on a BG/Q architecture like FERMI could be an hard task due to the following problems: the core files generated by a crashing job on FERMI

More information

Introduction to the Power6 system

Introduction to the Power6 system Introduction to the Power6 system Dr. John Donners john.donners@sara.nl Consultant High Performance Computing & Visualization SARA Computing & Networking Services About SARA > SARA is an independent not-for-profit

More information

Altix Usage and Application Programming

Altix Usage and Application Programming Center for Information Services and High Performance Computing (ZIH) Altix Usage and Application Programming Discussion And Important Information For Users Zellescher Weg 12 Willers-Bau A113 Tel. +49 351-463

More information

How то Use HPC Resources Efficiently by a Message Oriented Framework.

How то Use HPC Resources Efficiently by a Message Oriented Framework. How то Use HPC Resources Efficiently by a Message Oriented Framework www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivanova Institute of Information and Communication Technologies Bulgarian Academy of Science

More information

The Red Storm System: Architecture, System Update and Performance Analysis

The Red Storm System: Architecture, System Update and Performance Analysis The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI

More information

BlueGene/L. Computer Science, University of Warwick. Source: IBM

BlueGene/L. Computer Science, University of Warwick. Source: IBM BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours

More information

TotalView. Debugging Tool Presentation. Josip Jakić

TotalView. Debugging Tool Presentation. Josip Jakić TotalView Debugging Tool Presentation Josip Jakić josipjakic@ipb.ac.rs Agenda Introduction Getting started with TotalView Primary windows Basic functions Further functions Debugging parallel programs Topics

More information

[Scalasca] Tool Integrations

[Scalasca] Tool Integrations Mitglied der Helmholtz-Gemeinschaft [Scalasca] Tool Integrations Aug 2011 Bernd Mohr CScADS Performance Tools Workshop Lake Tahoe Contents Current integration of various direct measurement tools Paraver

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

FFTSS Library Version 3.0 User s Guide

FFTSS Library Version 3.0 User s Guide Last Modified: 31/10/07 FFTSS Library Version 3.0 User s Guide Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, is supported by the Development of Software Infrastructure for Large

More information

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation

IBM PSSC Montpellier Customer Center. Blue Gene/P ASIC IBM Corporation Blue Gene/P ASIC Memory Overview/Considerations No virtual Paging only the physical memory (2-4 GBytes/node) In C, C++, and Fortran, the malloc routine returns a NULL pointer when users request more memory

More information

Mixed MPI-OpenMP EUROBEN kernels

Mixed MPI-OpenMP EUROBEN kernels Mixed MPI-OpenMP EUROBEN kernels Filippo Spiga ( on behalf of CINECA ) PRACE Workshop New Languages & Future Technology Prototypes, March 1-2, LRZ, Germany Outline Short kernel description MPI and OpenMP

More information

Implementation of Parallelization

Implementation of Parallelization Implementation of Parallelization OpenMP, PThreads and MPI Jascha Schewtschenko Institute of Cosmology and Gravitation, University of Portsmouth May 9, 2018 JAS (ICG, Portsmouth) Implementation of Parallelization

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit

Analyzing the Performance of IWAVE on a Cluster using HPCToolkit Analyzing the Performance of IWAVE on a Cluster using HPCToolkit John Mellor-Crummey and Laksono Adhianto Department of Computer Science Rice University {johnmc,laksono}@rice.edu TRIP Meeting March 30,

More information

Introduction to OpenMP

Introduction to OpenMP 1 Introduction to OpenMP NTNU-IT HPC Section John Floan Notur: NTNU HPC http://www.notur.no/ www.hpc.ntnu.no/ Name, title of the presentation 2 Plan for the day Introduction to OpenMP and parallel programming

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages

More information

Leibniz Supercomputer Centre. Movie on YouTube

Leibniz Supercomputer Centre. Movie on YouTube SuperMUC @ Leibniz Supercomputer Centre Movie on YouTube Peak Performance Peak performance: 3 Peta Flops 3*10 15 Flops Mega 10 6 million Giga 10 9 billion Tera 10 12 trillion Peta 10 15 quadrillion Exa

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System

More information

BlueGene/L (No. 4 in the Latest Top500 List)

BlueGene/L (No. 4 in the Latest Top500 List) BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside

More information

IBM Blue Gene/Q solution

IBM Blue Gene/Q solution IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems

More information

Hybrid Programming with MPI and OpenMP. B. Estrade

Hybrid Programming with MPI and OpenMP. B. Estrade Hybrid Programming with MPI and OpenMP B. Estrade Objectives understand the difference between message passing and shared memory models; learn of basic models for utilizing both message

More information

Real Parallel Computers

Real Parallel Computers Real Parallel Computers Modular data centers Overview Short history of parallel machines Cluster computing Blue Gene supercomputer Performance development, top-500 DAS: Distributed supercomputing Short

More information

IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents

IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents Introduction...3 Architecture...4 simple_sched daemon...4 startd daemon...4 End-user commands...4 Personal HTC Scheduler...6

More information

Parallel Tools Platform for Judge

Parallel Tools Platform for Judge Parallel Tools Platform for Judge Carsten Karbach, Forschungszentrum Jülich GmbH September 20, 2013 Abstract The Parallel Tools Platform (PTP) represents a development environment for parallel applications.

More information

Tech Computer Center Documentation

Tech Computer Center Documentation Tech Computer Center Documentation Release 0 TCC Doc February 17, 2014 Contents 1 TCC s User Documentation 1 1.1 TCC SGI Altix ICE Cluster User s Guide................................ 1 i ii CHAPTER 1

More information