Mitglied der Helmholtz-Gemeinschaft JUQUEEN. Best Practices. Florian Janetzko. 29. November 2013

Size: px
Start display at page:

Download "Mitglied der Helmholtz-Gemeinschaft JUQUEEN. Best Practices. Florian Janetzko. 29. November 2013"

Transcription

1 Mitglied der Helmholtz-Gemeinschaft JUQUEEN Best Practices 29. November 2013 Florian Janetzko

2 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 2

3 JUQUEEN System Architecture IBM Blue Gene/Q JUQUEEN IBM PowerPC A2 1.6 GHz, 16 cores/node, 4-way SMT, 64-bit 4-wide (dbl) SIMD (FMA) 16 GB RAM per node Torus network 28 racks, 458,752 cores 5.9 Petaflop/s peak Connected to a Global Parallel File System (GPFS) with 10 PByte online disk and 37 PByte offline tape capacity 29. November 2013 Slide 3

4 JUQUEEN Challenges Chip 4-way SMT with1 integer + 1FPU instruction per cycle filling pipes 4-wide SIMD efficient vectorization Memory 1 GB/core and 0.5 GB/core for pure MPI codes memory consumption HW support for transactional memory efficient usage Network Torus network Mapping of tasks, communicators (communication pattern) I/O Processing large amounts of data Efficient I/O strategy and management Parallelism MPP system Scalability 29. November 2013 Slide 4

5 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 5

6 Module Environment Module concept Provides overview over available software packages Eases use of software packages Access to software packages, libraries Supply of different versions of applications Supply of application-specific information Enables dynamic modification of users environment Environment variables (PATH, LD_LIBRARY_PATH, MANPATH, ) are set appropriately Detection of conflicts between applications 29. November 2013 Slide 6

7 Module Environment $ module <options> <module> Option <no option> avail list load unload help show purge Description Lists available options of the module command Lists all available modules Lists modules currently loaded Loads a module Unloads a module Lists information about a module Information about settings done by the module Unloads all modules 29. November 2013 Slide 7

8 Module Environment Six module categories COMPILER IO Different compilers and versions of compilers I/O libraries and tools MATH MISC Mathematical libraries and software packages Software not fitting into another category SCIENTIFIC Software packages from different scientific fields TOOLS Performance analysis, debugger, etc. Software for Compute Nodes: /bgsys/local Front-end Nodes: /usr/local 29. November 2013 Slide 8

9 Module Environment $ module avail /bgsys/local/modulefiles/tools /bgsys/local/modulefiles/scientific cp2k/ cpmd/3.15.1_c(default) namd/2.8 cpmd/ lammps/30aug12 namd/2.9(default) cpmd/3.15.1_a lammps/5may12(default) cpmd/3.15.1_b libint/ /usr/local/modulefiles/compiler cmake/2.8.8(default) /usr/local/modulefiles/math /usr/local/modulefiles/scientific /usr/local/modulefiles/io /usr/local/modulefiles/tools UNITE/ /usr/local/modulefiles/misc November 2013 Slide 9

10 Module Environment Applications & Libraries Selected mathematical applications and libraries arpack (2.1) gsl (1.15) mumps (4.10.0) scalapack (2.0.2) fftw (2.1.5, 3.3.3) hypre (2.9.0) parmetis (4.0.2) sprng (2.0) gmp (5.0.5) lapack (3.4.2) petsc (3.4.2) sundials (2.5.0) Selected scientific applications CPMD (3.15.3) Gromacs (4.5.5) OpenFOAM (2.1.1) CP2K ( ) Lammps (5May12,30Aug12) QE (5.0.1) GPAW* Namd (2.8, 2.9) VASP** * In preparation ** Software not installed but makefiles are available 29. November 2013 Slide 10

11 Module Environment Applications & Libraries Selected I/O libraries and tools Darshan (2.2.4) netcdf (4.3) SIONlib (1.4.3) HDF5 (1.8.11) parallel-netcdf (1.3.1) Selected tools Cmake (2.8.11) hpctoolkit (5.3.2) Tau (2.22.3b4) Clang (3.4) PAPI (5.1.1) Totalview (8.12.0) extrae (2.4.1) Scalasca (2.1) Vampir (8.1) 29. November 2013 Slide 11

12 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 12

13 Running Simulations Batch System Execution of applications managed by LoadLeveler Users submit jobs using a job command file LoadLeveler allocates computing resources to run jobs The scheduling of jobs depends on Availability of resources Job priority (jobs with larger core counts are privileged) Jobs run in queues (job classes) Classes chosen by LoadLeveler according to core count of job For Information about LoadLeveler on JUQUEEN see November 2013 Slide 13

14 LoadLeveler - Commands Command llsubmit <jobfile> llq llq l <job ID> llq s <job ID> llq u <user> llcancel <job ID> llstatus llclass llqx Description Sends job to the queuing system Lists all queued and running jobs detailed information about the specified job detailed information about a specific queued job, e.g. expected start time lists all jobs of the specified user Kills the specified job Displays the status of LoadLeveler Lists existing classes and their properties Shows detailed information about all jobs 29. November 2013 Slide 14

15 LoadLeveler Command Examples Submitting a batch job: llsubmit $ llsubmit batch-job.js llsubmit: Processed command file through Submit Filter: "/bgdata/admin/loadl/extensions/filter". llsubmit: The job "juqueen2c1.zam.kfa-juelich.de.35395" has been submitted. Query status of submitted jobs: llq $ llq -u userid Id Owner Submitted ST PRI Class Running On juqueen2c userid 11/15 10:11 I 50 n001 1 job step(s) in query, 1 waiting, 0 pending, 0 running, 0 held, 0 preempted Cancel a submitted job: llcancel $ llcancel juqueen2c llcancel: Cancel command has been sent to the central manager. 29. November 2013 Slide 15

16 LoadLeveler Job Command File ASCII file containing two major parts 1. LoadLeveler job keywords block at the beginning of a file LoadLeveler keywords have the form #@<keyword> # can be separated by any number of blanks 2. One or more application script blocks Regular shell script Can contain any shell command 29. November 2013 Slide 16

17 LoadLeveler Standard Keywords Keyword complete error never start always file name> name for stdout> name for stderr> COPY_ALL] Description Name of the job Send notification if the job is finished if the job returned an error code 0 never upon the start of the job combination of start, end, error Mail address to send messages to Requested wall time for the job Specifies corresponding file names Environment variable to be exported to job Queue job 29. November 2013 Slide 17

18 LoadLeveler Blue Gene/Q Keywords Keyword bluegene] of nodes> Xa Xb Xc Xd Description Specifies the type of job step to process. Must be set to bluegene for parallel applications. Size of the Blue Gene job, keywords bg_size and bg_shape are mutually exclusive. Specifies the requested shape of a job (in midplanes). whether the scheduler should consider all possible rotations of the given shape Type of wiring requested for the block (can be specified for each dimension separately) 29. November 2013 Slide 18

19 LoadLeveler Job Classes Class name #Nodes Max. run time Default run time n :30:00 00:30:00 n :30:00 00:30:00 n :00:00 06:00:00 n :00:00 06:00:00 m :00:00 06:00:00 m :00:00 06:00:00 m :00:00 06:00:00 m :00:00 06:00:00 m016* :00:00 06:00:00 m032* :00:00 06:00:00 m048* :00:00 06:00:00 m056* :00:00 06:00:00 *On demand only You will be charged for the full partition (e.g. if you request 513 nodes you will be charged for 1024 nodes!) Always use full partitions! 29. November 2013 Slide 19

20 LoadLeveler Job Scheduling Backfill scheduler The biggest job has the highest priority (Top Dog) LoadLeveler fills gaps with smaller, short-running jobs while freeing the system for the Top Dog Tip: Specify the wall time for your jobs as exact as possible, because jobs requesting a shorter wall time have a better chance to be executed. Big jobs Jobs requesting 8 racks are collected and run in dedicated time slots (e.g. after a maintenance) at least once a week 29. November 2013 Slide 20

21 Running Simulations runjob Command Launch command for parallel applications runjob [options] runjob [options]: <executable> [arguments] Option --args <prg_arg> --exe <executable> --envs <ENV_Var=Value> --exp-env <ENV_Var> --np <number> --ranks-per-node <number> Description Passes "prg_arg" to the launched application on the compute node. Specifies the full path to the executable Sets the environment variable ENV_Var=Value Sets the environment variable ENV_Var Total number of (MPI) tasks Number of (MPI) tasks per compute node 29. November 2013 Slide 21

22 LoadLeveler Example Job Command File I #@job_name = MPI_code #@comment = 32 ranks per node" #@output = test_$(jobid)_$(stepid).out #@error = test_$(jobid)_$(stepid).err #@environment = COPY_ALL #@job_type = bluegene #@notification = never #@bg_size = 512 #@bg_connectivity = torus #@wall_clock_limit = 14:00:00 #@queue runjob --np ranks-per-node 32 --exe app.x Pure MPI applications need to use 32 tasks per node in order use the architecture efficiently! 29. November 2013 Slide 22

23 Running Simulations MPI/OpenMP Codes On Blue Gene/P Three modes were available 1. VN mode (4 MPI tasks, no thread per task) 2. DUAL mode (2 MPI tasks with 2 OpenMP threads each) 3. SMP mode (1 MPI task with 4 OpenMP threads) On Blue Gene/Q One node has 16 cores with 4-way SMT each Several configurations possible ntasks nthreads = 64 ntasks = 2 n, 0 n 6 Test carefully, which configuration gives the best performance for your application and setup! 29. November 2013 Slide 23

24 LoadLeveler Example Job Command File II = hybrid_code = 16x4 configuration" = test_$(jobid)_$(stepid).out = test_$(jobid)_$(stepid).err = COPY_ALL = bluegene = never = 512 = torus = 14:00:00 runjob --np ranks-per-node 16\ --env OMP_NUM_THREADS=4 : app.x i input 29. November 2013 Slide 24

25 Monitoring of Jobs LoadLeveler llq [options] Llview Client-server based application compact summary of different information (e.g. current usage of system, job prediction, expected and average waiting times, ) Customizable Developed by W. Frings (JSC) 29. November 2013 Slide 25

26 Llview 29. November 2013 Slide 26

27 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 27

28 Compilers Different compilers for front-end and compute nodes GNU and IBM XL family of compilers available Tip: It is recommended to use the XL suite of compilers for the CN since they produce in general better optimized code. Language XL compiler GNU compiler C xlc, xlc_r gcc C++ xlc++, xlc++_r, xlc, xlc_r g++ Fortran xlf, xlf90, xlf95, xlf2003 xlf_r, xlf90_r, xlf95_r, xlf2003_r gfortran 29. November 2013 Slide 28

29 Compilers for CN Language Compiler invocation MPI wrapper C powerpc64-bgq-linux-gcc mpigcc C++ powerpc64-bgq-linux-g++ mpig++ Fortran powerpc64-bgq-linux-gfortran mpigfortran Tip: Language To be on the safe side, always use the corresponding MPI wrappers with _r when compiling for the CNs. Compiler invocation MPI wrapper (thread-safe: *_r) (thread-safe: *_r) C bgxlc, bgc89, bgc99 mpixlc C++ bgxlc++, bgxlc mpixlcxx Fortran bgxlf, bgxlf90, bgxlf95, bgxlf2003 mpixlf77, mpixlf90, mpixlf95, mpixlf November 2013 Slide 29

30 Basic Compiler/Linker Options XL Compilers I Flags in order of increasing optimization potential Optimization Level Description -O2 -qarch=qp -qtune=qp Basic optimization -O3 -qstrict -qarch=qp -qtune=qp More aggressive, not impact on acc. -O3 -qhot -qarch=qp -qtune=qp More aggressive, may influence acc. (high-order transformations of loops) -O4 -qarch=qp -qtune=qp Interprocedural optimization at compile time -O5 -qarch=qp -qtune=qp Interprocedural optimization at link time, whole program analysis Some flags need to be used during compilation AND linking Check the compiler manual. If you are not sure, include flags used for compiling also in the linking step! 29. November 2013 Slide 30

31 Basic Compiler/Linker Options XL Compilers II Additional compiler flags Compiler/Linker Flag -qsmp=omp -qthreaded -qreport -qlist -qessl lessl[smp]bg Description Switch on OpenMP support Generates for each source file <name> a file <name>.lst with pseudo code and a description of the kind of code optimizations which were performed Compiler attempts to replace some intrinsic FORTRAN 90 procedures by essl routines where it is safe to do so How to link with ESSL routines, see November 2013 Slide 31

32 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 32

33 Diagnostic Compiler Flags (XL Compilers) Diagnostic messages are given on the terminal and/or in a separate file -qreport: compilers generate a file name.lst for each source file -qlist: compiler listing including an object listing -qlistopt: options in effect during compilation included in listing Listen to the compiler! -qflag=<listing-severety>:<terminal-severety> i: informal messages, w: warning messages, s: severe errors Use -qflag=i:i to get all information -qlistfmt=(xml html)=<option> 29. November 2013 Slide 33

34 Example: Compilers Diagnostics subroutine mult(c,a,ndim) implicit none integer :: ndim,i,j double precision :: a(ndim),c(ndim,ndim)! Loop do i=1,1000 do j=1,1000 c(i,j) = a(i) enddo enddo end subroutine mult >>>>> LOOP TRANSFORMATION SECTION <<<<< 1 SUBROUTINE mult (c, a, ndim) [...] Id=1 DO $$CIV2 = $$CIV2, IF (.FALSE.) GOTO lab_11 $$LoopIV1 = 0 Id=2 [...] DO $$LoopIV1 = $$LoopIV1, Loop interchanging applied to loop nest Outer loop has been unrolled 8 time(s). 29. November 2013 Slide 34

35 Single-Core Optimization Compiler/Linker Flags -qessl For Fortran codes If either lessl or -lesslsmp are also specified then ESSL routines should be used in place of some Fortran90 intrinsic procedures when there is a safe opportunity to do so. -qipa (compiling and linking step) Turns on or customizes interprocedural analysis (IPA) High potential for performance benefits May considerably increase time for compiling and linking step! 29. November 2013 Slide 35

36 Single-Core Optimization Compiler/Linker Flags -qinline[=auto:level=5, +procedure1[:procedure2[: ]], ] Attempts to inline procedures instead of generating calls to those procedures, for improved performance. Several suboptions are available, check the man page. -qsimd=auto Enables the automatic generation of vector instructions -qtm Enables support for transactional memory Use only with thread-safe compiler wrappers -qunroll=yes Instructs the compiler to search for more opportunities for loop unrolling than that performed with auto. 29. November 2013 Slide 36

37 Outline Production Environment Module Environment Job Execution Basic Porting Compilers and Wrappers Compiler/Linker Flags Tuning Applications Advanced Compiler/Linker Flags QPX 29. November 2013 Slide 37

38 Quad Floating Point Extension Unit (QPX) 4 double precision pipelines, usable as: scalar FPU 4-wide FPU SIMD (Single Instruction Multiple Data) 2-wide complex arithmetic SIMD 8 concurrent floating point ops (FMA) + load + store 29. November 2013 Slide 38

39 IBM XL Compiler Support for QPX Usage of QPX Compiler flag -qsimd=auto Check that simd vectorization is actually done! -qreport -qlist >>>> LOOP TRANSFORMATION SECTION <<<< [...] Loop with nest-level 1 and iteration count 1000 was SIMD vectorized [...] >>>> LOOP TRANSFORMATION SECTION <<<< [...] Loop was not SIMD vectorized because the loop is not the innermost loop Loop was not SIMD vectorized because it contains memory references with non- vectorizable alignment. 29. November 2013 Slide 39

40 QPX Usage Hints for the Compiler Compiler needs hints Hint compiler to likely iteration counts Instruct compiler to align fields Tell that FORTRAN assumed-shape arrays are contiguous -qassert=contig real*8 :: x(:),y(:),a!ibm* align(32, x, y)!ibm* assert(itercnt(100)) do i=m, n z(i) = x(i) + a*y(i) enddo double align(32) *x, *y; double a; #pragma disjoint(*x, *y) #pragma disjoint(*x, a) #pragma ibm iterations(100) for (int i=m;i<n;i++) z[i] = x[i] + a*y[i] void foo(double* restrict a1, double* restrict a2) { for (int i=0; i<n; i++) a1[i]=a2[i]; } 29. November 2013 Slide 40

41 IBM XL QPX Intrinsics New intrinsic variable type C/C++: vector4double FORTRAN: vector(real(8)) Wide set of elemental functions available LOAD,STORE, MULT, MULT-ADD, ROUND, CEILING, SQRT, Strengths: User may layout calculation by hand, if compiler not smart enough (e.g. where no loop) Easy to use: Leave stack, register layout, load/store scheduling to compiler 29. November 2013 Slide 41

42 QPX Example using Compiler Intrinsics typedef vector4double qv; qv dx,dy,dz,dx2,dy2,dz2 for (i=0;i<4;i++) { } xd[i] = xdipl[j]; yd[i] = ydipl[j]; zd[i] = zdipl[j]; dx2 = vec_mul(dx,dx); dy2 = vec_mul(dy,dy); dz2 = vec_mul(dz,dz); d = vec_swsqrt(dx2+dy2+dz2); Source: IBM Corporation 29. November 2013 Slide 42

43 User Information and Support Information about JUQUEEN JSC websites at IBM Blue Gene/Q Application Development Redbook Dispatch and User Support Applications for accounts (for approved projects) User Support Forschunszentrum Jülich GmbH,JSC, Dispatch, Jülich Tel: , Fax: Tel: November 2013 Slide 43

44 Workshop Announcement: Second JUQUEEN Porting and Tuning Workshop 3-5 February 2014 Forschungszentrum Jülich Jülich Supercomputing Centre Contact: Dr. Dirk Brömmel, d.broemmel(at)fz-juelich.de END 29. November 2013 Slide 44

Best Practices JUGENE. Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich

Best Practices JUGENE. Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich Best Practices JUGENE Florian Janetzko, Jülich Supercomputing Centre (JSC) Institute for Advanced Simulation, Forschungszentrum Jülich Outline Supercomputing@FZJ A brief Overview Best Practice JUGENE System

More information

Blue Gene/Q User Workshop. User Environment & Job submission

Blue Gene/Q User Workshop. User Environment & Job submission Blue Gene/Q User Workshop User Environment & Job submission Topics Blue Joule User Environment Loadleveler Task Placement & BG/Q Personality 2 Blue Joule User Accounts Home directories organised on a project

More information

Introduction to HPC Numerical libraries on FERMI and PLX

Introduction to HPC Numerical libraries on FERMI and PLX Introduction to HPC Numerical libraries on FERMI and PLX HPC Numerical Libraries 11-12-13 March 2013 a.marani@cineca.it WELCOME!! The goal of this course is to show you how to get advantage of some of

More information

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Mathematical Libraries and Application Software on JUQUEEN and JURECA Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course November 2015 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Mathematical Libraries and Application Software on JUQUEEN and JURECA Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUQUEEN and JURECA JSC Training Course May 2017 I.Gutheil Outline General Informations Sequential Libraries Parallel

More information

Parallel I/O on JUQUEEN

Parallel I/O on JUQUEEN Parallel I/O on JUQUEEN 4. Februar 2014, JUQUEEN Porting and Tuning Workshop Mitglied der Helmholtz-Gemeinschaft Wolfgang Frings w.frings@fz-juelich.de Jülich Supercomputing Centre Overview Parallel I/O

More information

Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN. JSC Training Course

Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN. JSC Training Course Mitglied der Helmholtz-Gemeinschaft Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN JSC Training Course May 22, 2012 Outline General Informations Sequential Libraries Parallel

More information

The Blue Gene/P at Jülich Introduction. W.Frings, Forschungszentrum Jülich,

The Blue Gene/P at Jülich Introduction. W.Frings, Forschungszentrum Jülich, The Blue Gene/P at Jülich Introduction W.Frings, Forschungszentrum Jülich, 26.08.2008 Overview Introduction System overview Using the Blue Gene/P system Jugene Compiling Running a program Libraries, Overview

More information

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings

I/O at JSC. I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O. Wolfgang Frings Mitglied der Helmholtz-Gemeinschaft I/O at JSC I/O Infrastructure Workloads, Use Case I/O System Usage and Performance SIONlib: Task-Local I/O Wolfgang Frings W.Frings@fz-juelich.de Jülich Supercomputing

More information

Porting Applications to Blue Gene/P

Porting Applications to Blue Gene/P Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P

More information

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small

More information

Blue Gene/P Advanced Topics

Blue Gene/P Advanced Topics Blue Gene/P Advanced Topics Blue Gene/P Memory Advanced Compilation with IBM XL Compilers SIMD Programming Communications Frameworks Checkpoint/Restart I/O Optimization Dual FPU Architecture One Load/Store

More information

Developing Environment on BG/Q FERMI. Mirko Cestari

Developing Environment on BG/Q FERMI. Mirko Cestari Developing Environment on BG/Q FERMI Mirko Cestari m.cestari@cineca.it USER SUPPORT superc@cineca.it WHAT THE USERS THINK OF SYS-ADMINS WHAT THE SYS-ADMINS THINK OF USERS Outline A first step Introduction

More information

Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007

Early experience with Blue Gene/P. Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Early experience with Blue Gene/P Jonathan Follows IBM United Kingdom Limited HPCx Annual Seminar 26th. November 2007 Agenda System components The Daresbury BG/P and BG/L racks How to use the system Some

More information

I/O Monitoring at JSC, SIONlib & Resiliency

I/O Monitoring at JSC, SIONlib & Resiliency Mitglied der Helmholtz-Gemeinschaft I/O Monitoring at JSC, SIONlib & Resiliency Update: I/O Infrastructure @ JSC Update: Monitoring with LLview (I/O, Memory, Load) I/O Workloads on Jureca SIONlib: Task-Local

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Monday, May 18 13:00-13:30 Welcome

More information

Vector Float Point Unit - QPX

Vector Float Point Unit - QPX Vector Float Point Unit - QPX QPX (77) The computational model of the QPX architecture is a vector single instruction, multiple data (SIMD) model with four execution slots and a register file that contains

More information

LLVM and Clang on the Most Powerful Supercomputer in the World

LLVM and Clang on the Most Powerful Supercomputer in the World LLVM and Clang on the Most Powerful Supercomputer in the World Hal Finkel November 7, 2012 The 2012 LLVM Developers Meeting Hal Finkel (Argonne National Laboratory) LLVM and Clang on the BG/Q November

More information

Introduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake. Valentin Pavlov

Introduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake. Valentin Pavlov Introduction to HPC Programming 4. C and FORTRAN compilers; make, configure, cmake Valentin Pavlov About these lectures This is the fourth of series of six introductory lectures discussing

More information

HPM Hardware Performance Monitor for Bluegene/Q

HPM Hardware Performance Monitor for Bluegene/Q HPM Hardware Performance Monitor for Bluegene/Q PRASHOBH BALASUNDARAM I-HSIN CHUNG KRIS DAVIS JOHN H MAGERLEIN The Hardware performance monitor (HPM) is a component of IBM high performance computing toolkit.

More information

Optimising with the IBM compilers

Optimising with the IBM compilers Optimising with the IBM Overview Introduction Optimisation techniques compiler flags compiler hints code modifications Optimisation topics locals and globals conditionals data types CSE divides and square

More information

Introduction to Compilers and Optimization

Introduction to Compilers and Optimization Introduction to Compilers and Optimization Le Yan (lyan1@cct.lsu.edu) Scientific Computing Consultant Louisiana Optical Network Initiative / LSU HPC April 1, 2009 Goals of training Acquaint users with

More information

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich

Welcome to the. Jülich Supercomputing Centre. D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Mitglied der Helmholtz-Gemeinschaft Welcome to the Jülich Supercomputing Centre D. Rohe and N. Attig Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich Schedule: Thursday, Nov 26 13:00-13:30

More information

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes

DEBUGGING ON FERMI PREPARING A DEBUGGABLE APPLICATION GDB. GDB on front-end nodes DEBUGGING ON FERMI Debugging your application on a system based on a BG/Q architecture like FERMI could be an hard task due to the following problems: the core files generated by a crashing job on FERMI

More information

Introduction to PICO Parallel & Production Enviroment

Introduction to PICO Parallel & Production Enviroment Introduction to PICO Parallel & Production Enviroment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Nicola Spallanzani n.spallanzani@cineca.it

More information

Development Environment on BG/Q FERMI. Nicola Spallanzani

Development Environment on BG/Q FERMI. Nicola Spallanzani Development Environment on BG/Q FERMI Nicola Spallanzani n.spallanzani@cineca.it www.hpc.cineca.it USER SUPPORT superc@cineca.it WHAT THE USERS THINK OF SYS-ADMINS WHAT THE SYS-ADMINS THINK OF USERS Outline

More information

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center

Content. MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler. IBM PSSC Montpellier Customer Center Content IBM PSSC Montpellier Customer Center MPIRUN Command Environment Variables LoadLeveler SUBMIT Command IBM Simple Scheduler Control System Service Node (SN) An IBM system-p 64-bit system Control

More information

Parallel I/O and Portable Data Formats I/O strategies

Parallel I/O and Portable Data Formats I/O strategies Parallel I/O and Portable Data Formats I/O strategies Sebastian Lührs s.luehrs@fz-juelich.de Jülich Supercomputing Centre Forschungszentrum Jülich GmbH Jülich, March 13 th, 2017 Outline Common I/O strategies

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

Job Management on LONI and LSU HPC clusters

Job Management on LONI and LSU HPC clusters Job Management on LONI and LSU HPC clusters Le Yan HPC Consultant User Services @ LONI Outline Overview Batch queuing system Job queues on LONI clusters Basic commands The Cluster Environment Multiple

More information

Mitglied der Helmholtz-Gemeinschaft. System Monitoring: LLview

Mitglied der Helmholtz-Gemeinschaft. System Monitoring: LLview Mitglied der Helmholtz-Gemeinschaft System Monitoring: LLview November 27, 2015 Carsten Karbach and Julia Valder Content 1 Overview 2 Components 3 Customization November 27, 2015 Carsten Karbach and Julia

More information

The IBM Blue Gene/Q: Application performance, scalability and optimisation

The IBM Blue Gene/Q: Application performance, scalability and optimisation The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,

More information

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE , NOW AND IN THE FUTURE Which, why and how do they compare in our systems? 08.07.2018 I MUG 18, COLUMBUS (OH) I DAMIAN ALVAREZ Outline FZJ mission JSC s role JSC s vision for Exascale-era computing JSC

More information

Jülich Supercomputing Centre

Jülich Supercomputing Centre Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre Norbert Attig Jülich Supercomputing Centre (JSC) Forschungszentrum Jülich (FZJ) Aug 26, 2009 DOAG Regionaltreffen NRW 2 Supercomputing at

More information

Parallel Tools Platform for Judge

Parallel Tools Platform for Judge Parallel Tools Platform for Judge Carsten Karbach, Forschungszentrum Jülich GmbH September 20, 2013 Abstract The Parallel Tools Platform (PTP) represents a development environment for parallel applications.

More information

Introduction to the Power6 system

Introduction to the Power6 system Introduction to the Power6 system Dr. John Donners john.donners@sara.nl Consultant High Performance Computing & Visualization SARA Computing & Networking Services About SARA > SARA is an independent not-for-profit

More information

Introduction to CINECA HPC Environment

Introduction to CINECA HPC Environment Introduction to CINECA HPC Environment 23nd Summer School on Parallel Computing 19-30 May 2014 m.cestari@cineca.it, i.baccarelli@cineca.it Goals You will learn: The basic overview of CINECA HPC systems

More information

Advanced cluster techniques with LoadLeveler

Advanced cluster techniques with LoadLeveler Advanced cluster techniques with LoadLeveler How to get your jobs to the top of the queue Ciaron Linstead 10th May 2012 Outline 1 Introduction 2 LoadLeveler recap 3 CPUs 4 Memory 5 Factors affecting job

More information

HPC with PGI and Scalasca

HPC with PGI and Scalasca HPC with PGI and Scalasca Stefan Rosenberger Supervisor: Univ.-Prof. Dipl.-Ing. Dr. Gundolf Haase Institut für Mathematik und wissenschaftliches Rechnen Universität Graz May 28, 2015 Stefan Rosenberger

More information

Performance Analysis on Blue Gene/P

Performance Analysis on Blue Gene/P Performance Analysis on Blue Gene/P Tulin Kaman Department of Applied Mathematics and Statistics Stony Brook University From microprocessor to the full Blue Gene P/system IBM XL Compilers The commands

More information

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC

Batch Usage on JURECA Introduction to Slurm. May 2016 Chrysovalantis Paschoulas HPS JSC Batch Usage on JURECA Introduction to Slurm May 2016 Chrysovalantis Paschoulas HPS group @ JSC Batch System Concepts Resource Manager is the software responsible for managing the resources of a cluster,

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

QCD Performance on Blue Gene/L

QCD Performance on Blue Gene/L QCD Performance on Blue Gene/L Experiences with the Blue Gene/L in Jülich 18.11.06 S.Krieg NIC/ZAM 1 Blue Gene at NIC/ZAM in Jülich Overview: BGL System Compute Chip Double Hummer Network/ MPI Issues Dirac

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Alessandro Grottesi a.grottesi@cineca.it SuperComputing Applications and

More information

Compiling applications for the Cray XC

Compiling applications for the Cray XC Compiling applications for the Cray XC Compiler Driver Wrappers (1) All applications that will run in parallel on the Cray XC should be compiled with the standard language wrappers. The compiler drivers

More information

Introduction to Compilers HPC Workshop University of Kentucky May 9, 2007 May 10, 2007

Introduction to Compilers HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Introduction to Compilers HPC Workshop University of Kentucky May 9, 2007 May 10, 2007 Andrew Komornicki, Ph. D. Balaji Veeraraghavan, Ph. D. Agenda Introduction Availability of compilers, GNU, Intel and

More information

Code optimization with the IBM XL compilers on Power architectures IBM

Code optimization with the IBM XL compilers on Power architectures IBM Code optimization with the IBM XL compilers on Power architectures IBM December 2017 References in this document to IBM products, programs, or services do not imply that IBM intends to make these available

More information

Shared Memory Programming With OpenMP Computer Lab Exercises

Shared Memory Programming With OpenMP Computer Lab Exercises Shared Memory Programming With OpenMP Computer Lab Exercises Advanced Computational Science II John Burkardt Department of Scientific Computing Florida State University http://people.sc.fsu.edu/ jburkardt/presentations/fsu

More information

Blue Gene/Q A system overview

Blue Gene/Q A system overview Mitglied der Helmholtz-Gemeinschaft Blue Gene/Q A system overview M. Stephan Outline Blue Gene/Q hardware design Processor Network I/O node Jülich Blue Gene/Q configurations (JUQUEEN) Blue Gene/Q software

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

Introduction to GALILEO

Introduction to GALILEO November 27, 2016 Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it SuperComputing Applications and Innovation Department

More information

Introduction to Supercomputing at. Kate Hedstrom, Arctic Region Supercomputing Center (ARSC) Jan, 2004

Introduction to Supercomputing at. Kate Hedstrom, Arctic Region Supercomputing Center (ARSC) Jan, 2004 1 Introduction to Supercomputing at ARSC Kate Hedstrom, Arctic Region Supercomputing Center (ARSC) kate@arsc.edu Jan, 2004 2 Topics Introduction to Supercomputers at ARSC Computers Accounts Getting an

More information

meinschaft May 2012 Markus Geimer

meinschaft May 2012 Markus Geimer meinschaft Mitglied der Helmholtz-Gem Module setup and compiler May 2012 Markus Geimer The module Command Software which allows to easily manage different versions of a product (e.g., totalview 5.0 totalview

More information

Matrix Multiplication on Blue Gene/P User Guide

Matrix Multiplication on Blue Gene/P User Guide Matrix Multiplication on Blue Gene/P User Guide Maciej Cytowski, Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw 1. Introduction This document is a User Guide

More information

Overview on HPC software (compilers, libraries, tools)

Overview on HPC software (compilers, libraries, tools) Overview on HPC software (compilers, libraries, tools) July 5, 2018 Slavko Brdar 1,3, Klaus Goergen 2,3, Inge Gutheil 1, Damian Alvarez 1, Michael Knobloch 1 1 Jülich Supercomputing Centre, Research Centre

More information

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Scalasca support for Intel Xeon Phi Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany Overview Scalasca performance analysis toolset support for MPI & OpenMP

More information

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop

Before We Start. Sign in hpcxx account slips Windows Users: Download PuTTY. Google PuTTY First result Save putty.exe to Desktop Before We Start Sign in hpcxx account slips Windows Users: Download PuTTY Google PuTTY First result Save putty.exe to Desktop Research Computing at Virginia Tech Advanced Research Computing Compute Resources

More information

Performance analysis on Blue Gene/Q with

Performance analysis on Blue Gene/Q with Performance analysis on Blue Gene/Q with + other tools and debugging Michael Knobloch Jülich Supercomputing Centre scalasca@fz-juelich.de July 2012 Based on slides by Brian Wylie and Markus Geimer Performance

More information

Code Optimization. Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC)

Code Optimization. Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) Code Optimization Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC) brandon.barker@cornell.edu Workshop: High Performance Computing on Stampede January 15, 2015

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Ekpe Okorafor School of Parallel Programming & Parallel Architecture for HPC ICTP October, 2014 A little about me! PhD Computer Engineering Texas A&M University Computer Science

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

Programming Environment 4/11/2015

Programming Environment 4/11/2015 Programming Environment 4/11/2015 1 Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent interface

More information

Shared Memory Programming With OpenMP Exercise Instructions

Shared Memory Programming With OpenMP Exercise Instructions Shared Memory Programming With OpenMP Exercise Instructions John Burkardt Interdisciplinary Center for Applied Mathematics & Information Technology Department Virginia Tech... Advanced Computational Science

More information

Open Multi-Processing: Basic Course

Open Multi-Processing: Basic Course HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

KISTI TACHYON2 SYSTEM Quick User Guide

KISTI TACHYON2 SYSTEM Quick User Guide KISTI TACHYON2 SYSTEM Quick User Guide Ver. 2.4 2017. Feb. SupercomputingCenter 1. TACHYON 2 System Overview Section Specs Model SUN Blade 6275 CPU Intel Xeon X5570 2.93GHz(Nehalem) Nodes 3,200 total Cores

More information

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Types of parallel computers. Parallel programming options. How to

More information

2014 LENOVO. ALL RIGHTS RESERVED.

2014 LENOVO. ALL RIGHTS RESERVED. 2014 LENOVO. ALL RIGHTS RESERVED. Parallel System description. Outline p775, p460 and dx360m4, Hardware and Software Compiler options and libraries used. WRF tunable parameters for scaling runs. nproc_x,

More information

OpenPBS Users Manual

OpenPBS Users Manual How to Write a PBS Batch Script OpenPBS Users Manual PBS scripts are rather simple. An MPI example for user your-user-name: Example: MPI Code PBS -N a_name_for_my_parallel_job PBS -l nodes=7,walltime=1:00:00

More information

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing

Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing Shared memory programming model OpenMP TMA4280 Introduction to Supercomputing NTNU, IMF February 16. 2018 1 Recap: Distributed memory programming model Parallelism with MPI. An MPI execution is started

More information

Blue Gene/Q User Workshop. Debugging

Blue Gene/Q User Workshop. Debugging Blue Gene/Q User Workshop Debugging Topics GDB Core Files Coreprocessor 2 GNU Debugger (GDB) The GNU Debugger (GDB) The Blue Gene/Q system includes support for running GDB with applications that run on

More information

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Types of parallel computers. Parallel programming options. How to

More information

Introduction to GALILEO

Introduction to GALILEO Introduction to GALILEO Parallel & production environment Mirko Cestari m.cestari@cineca.it Alessandro Marani a.marani@cineca.it Domenico Guida d.guida@cineca.it Maurizio Cremonesi m.cremonesi@cineca.it

More information

FFTSS Library Version 3.0 User s Guide

FFTSS Library Version 3.0 User s Guide Last Modified: 31/10/07 FFTSS Library Version 3.0 User s Guide Copyright (C) 2002-2007 The Scalable Software Infrastructure Project, is supported by the Development of Software Infrastructure for Large

More information

Beginner's Guide for UK IBM systems

Beginner's Guide for UK IBM systems Beginner's Guide for UK IBM systems This document is intended to provide some basic guidelines for those who already had certain programming knowledge with high level computer languages (e.g. Fortran,

More information

How to compile C/C++ program on SR16000

How to compile C/C++ program on SR16000 How to compile C/C++ program on SR16000 Center for Computational Materials Science, Institute for Materials Research, Tohoku University 2014.1 version 1.0 Contents 1. Compile... 1 1.1 How to use XL C/C++...

More information

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations

Introduction. No Optimization. Basic Optimizations. Normal Optimizations. Advanced Optimizations. Inter-Procedural Optimizations Introduction Optimization options control compile time optimizations to generate an application with code that executes more quickly. Absoft Fortran 90/95 is an advanced optimizing compiler. Various optimizers

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

Using the IBM Opteron 1350 at OSC. October 19-20, 2010

Using the IBM Opteron 1350 at OSC. October 19-20, 2010 Using the IBM Opteron 1350 at OSC October 19-20, 2010 Table of Contents Hardware Overview The Linux Operating System User Environment and Storage 2 Hardware Overview Hardware introduction Login node configuration

More information

Introduction to OpenMP

Introduction to OpenMP 1 Introduction to OpenMP NTNU-IT HPC Section John Floan Notur: NTNU HPC http://www.notur.no/ www.hpc.ntnu.no/ Name, title of the presentation 2 Plan for the day Introduction to OpenMP and parallel programming

More information

Introduction to OpenMP

Introduction to OpenMP Introduction to OpenMP Le Yan Scientific computing consultant User services group High Performance Computing @ LSU Goals Acquaint users with the concept of shared memory parallelism Acquaint users with

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection

More information

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information High Performance Computing Software Development Kit For Mac OS X In Depth Product Information 2781 Bond Street Rochester Hills, MI 48309 U.S.A. Tel (248) 853-0095 Fax (248) 853-0108 support@absoft.com

More information

The Cray Programming Environment. An Introduction

The Cray Programming Environment. An Introduction The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent

More information

IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents

IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents IBM Scheduler for High Throughput Computing on IBM Blue Gene /P Table of Contents Introduction...3 Architecture...4 simple_sched daemon...4 startd daemon...4 End-user commands...4 Personal HTC Scheduler...6

More information

[Scalasca] Tool Integrations

[Scalasca] Tool Integrations Mitglied der Helmholtz-Gemeinschaft [Scalasca] Tool Integrations Aug 2011 Bernd Mohr CScADS Performance Tools Workshop Lake Tahoe Contents Current integration of various direct measurement tools Paraver

More information

Algorithms and Computation in Signal Processing

Algorithms and Computation in Signal Processing Algorithms and Computation in Signal Processing special topic course 18-799B spring 2005 22 nd lecture Mar. 31, 2005 Instructor: Markus Pueschel Guest instructor: Franz Franchetti TA: Srinivas Chellappa

More information

ARM High Performance Computing

ARM High Performance Computing ARM High Performance Computing Eric Van Hensbergen Distinguished Engineer, Director HPC Software & Large Scale Systems Research IDC HPC Users Group Meeting Austin, TX September 8, 2016 ARM 2016 An introduction

More information

IBM PSSC Montpellier Customer Center. Content

IBM PSSC Montpellier Customer Center. Content Content IBM PSSC Montpellier Customer Center Standard Tools Compiler Options GDB IBM System Blue Gene/P Specifics Core Files + addr2line Coreprocessor Supported Commercial Software TotalView Debugger Allinea

More information

Automatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P

Automatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P Automatic Generation of the HPC Challenge's Global FFT Benchmark for BlueGene/P Franz Franchetti 1, Yevgen Voronenko 2, Gheorghe Almasi 3 1 University and SpiralGen, Inc. 2 AccuRay, Inc., 3 IBM Research

More information

IBM Blue Gene/Q solution

IBM Blue Gene/Q solution IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems

More information

IFISS is a set of test problems from [Elman, Silvester, Wathen]. Convectiondiffusion problems cd2 and cd4 are the two of interest here.

IFISS is a set of test problems from [Elman, Silvester, Wathen]. Convectiondiffusion problems cd2 and cd4 are the two of interest here. 1 Objective The aim of the work was to install the Trilinos software library on the HPCx service and then to investigate its performance using test problems from the ifiss suite. Trilinos is a collection

More information

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah m.cuma@utah.edu Overview Types of parallel computers. Parallel programming options. How to write

More information

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah Introduction to Parallel Programming Martin Čuma Center for High Performance Computing University of Utah mcuma@chpc.utah.edu Overview Types of parallel computers. Parallel programming options. How to

More information

Lecture 3: Intro to parallel machines and models

Lecture 3: Intro to parallel machines and models Lecture 3: Intro to parallel machines and models David Bindel 1 Sep 2011 Logistics Remember: http://www.cs.cornell.edu/~bindel/class/cs5220-f11/ http://www.piazza.com/cornell/cs5220 Note: the entire class

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0) Contributing sites and the corresponding computer systems for this call are: GCS@Jülich, Germany IBM Blue Gene/Q GENCI@CEA, France Bull Bullx

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

PROGRAMMING MODEL EXAMPLES

PROGRAMMING MODEL EXAMPLES ( Cray Inc 2015) PROGRAMMING MODEL EXAMPLES DEMONSTRATION EXAMPLES OF VARIOUS PROGRAMMING MODELS OVERVIEW Building an application to use multiple processors (cores, cpus, nodes) can be done in various

More information