Compilers & Optimized Librairies

Similar documents
Linear Algebra libraries in Debian. DebConf 10 New York 05/08/2010 Sylvestre

Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA

Scientific Computing. Some slides from James Lambers, Stanford

Brief notes on setting up semi-high performance computing environments. July 25, 2014

BLAS. Basic Linear Algebra Subprograms

How to compile Fortran program on application server

Parallelism V. HPC Profiling. John Cavazos. Dept of Computer & Information Sciences University of Delaware

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Mills HPC Tutorial Series. Mills HPC Basics

Programming Environment 4/11/2015

Achieve Better Performance with PEAK on XSEDE Resources

Sunday, February 19, 12

Intel Math Kernel Library 10.3

Some notes on efficient computing and high performance computing environments

Faster Code for Free: Linear Algebra Libraries. Advanced Research Compu;ng 22 Feb 2017

Mathematical libraries at the CHPC

Introduction to Linux Scripting (Part 2) Brett Milash and Wim Cardoen CHPC User Services

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Short LAPACK User s Guide

Intel Performance Libraries

Modules and Software. Daniel Caunt Harvard FAS Research Computing

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System

Deploying (community) codes. Martin Čuma Center for High Performance Computing University of Utah

Advanced School in High Performance and GRID Computing November Mathematical Libraries. Part I

Fast Multiscale Algorithms for Information Representation and Fusion

Performance comparison and optimization: Case studies using BenchIT

Programming LRZ. Dr. Volker Weinberg, RRZE, 2018

High-Performance Libraries and Tools. HPC Fall 2012 Prof. Robert van Engelen

Optimizing Parallel Electronic Structure Calculations Through Customized Software Environment

NEW ADVANCES IN GPU LINEAR ALGEBRA

Mathematical Libraries and Application Software on JUROPA, JUGENE, and JUQUEEN. JSC Training Course

Resources for parallel computing

Our new HPC-Cluster An overview

Advanced School in High Performance and GRID Computing November 2008

Introduction to Linux Part 3: Advanced Scripting and Compiling. Albert Lund CHPC User Services

Workshop on High Performance Computing (HPC08) School of Physics, IPM February 16-21, 2008 HPC tools: an overview

Linux environment. Graphical interface X-window + window manager. Text interface terminal + shell

Effective Use of CCV Resources

2.7 Numerical Linear Algebra Software

Introduction to Compilers and Optimization

APACHE TROUBLESHOOTING. Or, what to do when your vhost won t behave

BLAS and LAPACK + Data Formats for Sparse Matrices. Part of the lecture Wissenschaftliches Rechnen. Hilmar Wobker

Issues In Implementing The Primal-Dual Method for SDP. Brian Borchers Department of Mathematics New Mexico Tech Socorro, NM

Installation of OpenMX

Compiling applications for the Cray XC

What is Research Computing?

How to get Access to Shaheen2? Bilel Hadri Computational Scientist KAUST Supercomputing Core Lab

Intel Math Kernel Library (Intel MKL) Latest Features

PRACE PATC Course: Intel MIC Programming Workshop, MKL. Ostrava,

Code Optimization. Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC)

UNIX Shell Programming

EE/CSCI 451 Introduction to Parallel and Distributed Computation. Discussion #4 2/3/2017 University of Southern California

Introduction to Numerical Libraries for HPC. Bilel Hadri. Computational Scientist KAUST Supercomputing Lab.

Introduction to CINECA HPC Environment

MAGMA a New Generation of Linear Algebra Libraries for GPU and Multicore Architectures

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah

MODULAR ENVIRONMENT MANAGEMENT WITH VALET. Dr. Jeffrey Frey University of Delaware, IT

PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ,

Dense Linear Algebra on Heterogeneous Platforms: State of the Art and Trends

LAPACK. Linear Algebra PACKage. Janice Giudice David Knezevic 1

Intel Math Kernel Library for Linux*

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts

Introduction to GALILEO

Revision 1.1. Copyright 2011, XLsoft K.K. All rights reserved. 1

Introduction to Parallel Programming. Martin Čuma Center for High Performance Computing University of Utah

Distributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca

Using the MaRC2 HPC Cluster

Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC

Optimization and Scalability

PGI Visual Fortran Release Notes. Version The Portland Group

COMPILING FOR THE ARCHER HARDWARE. Slides contributed by Cray and EPCC

Using the computational resources at the GACRC

Eigen. a c++ linear algebra library. Gaël Guennebaud. [ Séminaire Aristote 15 May 2013

Introduction to PICO Parallel & Production Enviroment

Debugging, Profiling and Optimising Scientific Codes. Wadud Miah Research Computing Group

Introduction to Linux

Embedded Systems Programming

Intel Math Kernel Library (Intel MKL) BLAS. Victor Kostin Intel MKL Dense Solvers team manager

ATLAS (Automatically Tuned Linear Algebra Software),

Compiler Options. Linux/x86 Performance Practical,

PRACE PETSc Tutorial IT4Innovations, May 10-11, 2018

PGDBG Installation Guide

Linux User Environment. Robert McLay

Intel MIC Architecture. Dr. Momme Allalen, LRZ, PRACE PATC: Intel MIC&GPU Programming Workshop

The Cray Programming Environment. An Introduction

Koronis Performance Tuning 2. By Brent Swartz December 1, 2011

An Introduction to Multicore Programming and the NAG Library for SMP & multicore

Installing the Quantum ESPRESSO distribution

Department of Computer Science and Engineering Yonghong Yan

MAGMA. LAPACK for GPUs. Stan Tomov Research Director Innovative Computing Laboratory Department of Computer Science University of Tennessee, Knoxville

Using Compute Canada. Masao Fujinaga Information Services and Technology University of Alberta

TASKING Performance Libraries For Infineon AURIX MCUs Product Overview

Optimization and Scalability. Steve Lantz Senior Research Associate Cornell CAC

Study and implementation of computational methods for Differential Equations in heterogeneous systems. Asimina Vouronikoy - Eleni Zisiou

Automatic Performance Tuning. Jeremy Johnson Dept. of Computer Science Drexel University

Self Adapting Numerical Software (SANS-Effort)

Intel Math Kernel Library

Cray Scientific Libraries. Overview

Intel Math Kernel Library for macos*

Transcription:

Institut de calcul intensif et de stockage de masse Compilers & Optimized Librairies Modules Environment.bashrc env $PATH... Compilers : GNU, Intel, Portland Memory considerations : size, top, ulimit Hello world! exercice October 21th 2015 Bernard Van Renterghem

modules [bvr@hmem00 ~]$ module avail /usr/share/modules/modulefiles Armadillo/2.4.4 goolf 1.4.10 Python 2.7.3 ATLAS/3.8.4 gompi 1.1.0 no OFED LAPACK 3.4.0 Autoconf/2.69 GCC 4.8.2 Autoconf/2.69 goolf 1.4.10 Automake/1.13.4 goolf 1.4.10 Automake/1.14 GCC 4.8.2 Bert/0.9.0 82 Bert/0.9.1 BLACS/1.1 gompi 1.1.0 no OFED blas/gcc/3.2.1 [...]

modules [bvr@hmem00 ~]$ module avail GCC /usr/share/modules/modulefiles GCC/4.6.3 GCC/4.7.2 GCC/4.8.1 GCC/4.8.2 [bvr@hmem00 ~]$ module avail intel /usr/share/modules/modulefiles intel/clusterstudio/2013.0.028 intel/clusterstudio/advisor/2013.1 intel/clusterstudio/amplifier/2013.5 [...] [bvr@hmem00 ~]$ module avail pgi /usr/share/modules/modulefiles pgi/11.2 1

modules [bvr@hmem00 ~]$ module help Modules Release 3.2.10 2012 12 21 (Copyright GNU GPL v2 1991): Usage: module [ switches ] [ subcommand ] [subcommand args ] Switches: [...] add load rm unload list purge whatis...

environment [bvr@hmem00 ~]$ more.bash_profile #.bash_profile # Get the aliases and functions if [ f ~/.bashrc ]; then. ~/.bashrc fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export PATH

environment [bvr@hmem00 ~]$ more.bashrc #.bashrc # Source global definitions if [ f /etc/bashrc ]; then. /etc/bashrc fi # User specific aliases and functions module load intel/compilerpro [bvr@hmem00 ~]$ alias

environment [bvr@hmem00 ~]$ env MKLROOT=/usr/local/intel/ics_2013.0.028/composer_xe_2013.1.117/mkl MANPATH=/usr/local/intel/ics_2013.0.028/composer_xe_2013.1.117/man/en_US::/usr/share/man GLOBALSCRATCH=/globalfs/bvr HOSTNAME=hmem00.cism.ucl.ac.be IPPROOT=/usr/local/intel/ics_2013.0.028/composer_xe_2013.1.117/ipp INTEL_LICENSE_FILE=/opt/flexlm SHELL=/bin/bash TERM=xterm 256color HISTSIZE=1000 TMPDIR=/scratch [...]

environment [bvr@hmem00 ~]$ echo $PATH which ifort module purge which ifort [bvr@hmem00 ~]$ echo $MANPATH man k gcc

icc [bvr@hmem00 ~]$ icc help Intel(R) C++ Compiler Help ========================== Intel(R) Compiler includes compiler options that optimize for instruction sets that are available in both Intel(R) and non Intel microprocessors, but may perform additional optimizations for Intel microprocessors than for non Intel microprocessors. In addition, certain compiler options for Intel(R) Compiler are reserved for Intel microprocessors. For a detailed description of these compiler options, including the instructions they implicate, please refer to "Intel(R) Compiler User and Reference Guides > Compiler Options." usage: icc [options] file1 [file2...] icpc [options] file1 [file2...]

icc [bvr@hmem00 ~]$ icc help Optimization O0 O1 O2 O3 Ofast enable xhost O3 ipo no prec div static Code Generation x<code> SSE4.1 AVX [bvr@hmem00 ~]$ icc xavx flops.c [bvr@hmem00 ~]$./a.out Fatal Error: This program was not built to run in your system. Please verify that both the operating system and the processor support Intel(R) AVX.

icc [bvr@hmem00 ~]$ icc help Linking/Linker L<dir> shared static [bvr@hmem00 ~]$ icc fast flops.c o flops.fast [bvr@hmem00 ~]$ icc O0 flops.c o flops.o0 [bvr@hmem00 ~]$ ls l flops.fast flops.o0 rwxrwxr x 1 bvr bvr 638944 flops.fast rwxrwxr x 1 bvr bvr 12725 flops.o0 ldd LD_LIBRARY_PATH file [bvr@hmem00 ~]$ file flops.fast flops.fast: ELF 64 bit LSB executable, x86 64, version 1 (SYSV), statically linked, for GNU/Linux 2.6.18, not stripped

ex. flops.c Practice yourself : cp ~bvr/flops.c ~ Compile with gcc then icc then pgcc

Memory considerations cp ~bvr/par512mb.f ~ ifort O0 par512mb.f size a.out text data bss dec hex filename 569154 20304 536978152 537567610 200aa17a a.out bss = block started by symbol (uninitialized global data) Top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 5011 bvr 20 0 525m 512m 516 R 100.0 0.8 0:20.46 a.out

Memory considerations [bvr@hmem00 ~]$ ulimit a core file size (blocks, c) 0 data seg size (kbytes, d) 4647764 scheduling priority ( e) 0 file size (blocks, f) unlimited pending signals ( i) 515034 max locked memory (kbytes, l) unlimited max memory size (kbytes, m) unlimited open files ( n) 1024 pipe size (512 bytes, p) 8 POSIX message queues (bytes, q) 819200 real time priority ( r) 0 stack size (kbytes, s) 10240 cpu time (seconds, t) unlimited max user processes ( u) 1024 virtual memory (kbytes, v) unlimited file locks ( x) unlimited

Memory considerations [bvr@hmem00 ~]$ ulimit v 600000 [bvr@hmem00 ~]$ time./a.out real 0m5.751s user 0m4.014s sys 0m1.652s [bvr@hmem00 ~]$ ulimit v 500000 [bvr@hmem00 ~]$ time./a.out Killed real 0m0.001s

Ex2. Hello world! [bvr@hmem00 ~]$ wget ftp://ftp.belnet.be/mirror/ftp.gnu.org/gnu/hello/hello-2.8.tar.gz Untar it. See INSTALL :./configure; make; make install With pgcc and icc Use prefix Change the default message into the src > advantage of the Makefile

Librairies cp -rp ~bvr/matmul ~ ACML ( AMD Opteron ) versus MKL ( Intel Math Kernel Lib) BLAS Basic Linear Algebra Subprograms LAPACK A package of higher level linear algebra routines; FFT a set of Fast Fourier Transform routines RNG a set of random number generators and statistical distribution functions.

Librairies BLAS http://www.netlib.org/blas/index.html L1 scalar, vector and vector vector operations L2 matrix vector operations L3 matrix matrix operations

Librairies LAPACK http://www.netlib.org/lapack/ provides routines for solving systems of simultaneous linear equations, least squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. The associated matrix factorizations (LU, Cholesky, QR, SVD, Schur, generalized Schur) are also provided, as are related computations such as reordering of the Schur factorizations and estimating condition numbers. Dense and banded matrices are handled, but not general sparse matrices. for real and complex matrices, in both single and double precision.

Librairies exercice See matmul.f : #if defined MATMUL c = matmul(a, b) #elif defined BLAS call dgemm('no transpose','no transpose',m,n,p, & 1.0d0, a, m, b, n, 1.0d0, c, m) With pgf90 & acml ( L/opt/acml/4.4.0/pgi64/lib/ lacml) With ifort & mkl ( lmkl )