COSC 6385 Computer Architecture - Project

Size: px
Start display at page:

Download "COSC 6385 Computer Architecture - Project"

Transcription

1 COSC 6385 Computer Architecture - Project Edgar Gabriel Spring 2018 Hardware performance counters set of special-purpose registers built into modern microprocessors to store the counts of hardwarerelated activities within computer systems low overhead compared to software based methods types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations. Some of the subsequent material is based on a tutorial by P. Mucci, S. Moore, N. Smeds, Performance tuning using Hardware Counter Data, Supercomputing

2 Performance Application Programming Interface (PAPI) Portable API to access the hardware performance monitor counters found on most modern microprocessors. PAPI provides multiple interfaces to the underlying counter hardware: 1. The low level interface manages hardware events in user defined groups called EventSets. 2. The high level interface simply provides the ability to start, stop and read the counters for a specified list of events. PAPI High-level Interface 2

3 High-level Interface Meant for application programmers wanting coarsegrained measurements Not thread safe Calls the lower level API Easier to use and less setup (additional code) than lowlevel Allows only PAPI preset events standard set of events deemed most relevant for application performance tuning Run papi_avail to see list of PAPI preset events available on a platform High-level API C interface PAPI_start_counters() PAPI_read_counters() PAPI_stop_counters() PAPI_accum_counters() PAPI_num_counters() PAPI_flops() 3

4 Setting up the High-level Interface int PAPI_num_counters(void) Initializes PAPI (if needed) Returns number of hardware counters int PAPI_start_counters(int *events, int len) Initializes PAPI (if needed) Sets up an event set with the given counters Starts counting in the event set int PAPI_library_init(int version) Low-level routine implicitly called by above Controlling the Counters PAPI_stop_counters(long_long *vals, int alen) Stop counters and put counter values in array PAPI_accum_counters(long_long *vals, int alen) Accumulate counters into array and reset PAPI_read_counters(long_long *vals, int alen) Copy counter values into array and reset counters PAPI_flops(float *rtime, float *ptime, long_long *flpins, float *mflops) Wallclock time, process time, FP ins since start, Mflop/s since last call 4

5 PAPI High-level Example #include papi.h long long values[num_events]; unsigned int Events[NUM_EVENTS]={PAPI_TOT_INS,PAPI_TOT_CYC}; /* Start the counters */ PAPI_start_counters((int*)Events,NUM_EVENTS); /* What we are monitoring? */ do_work(); /* Stop the counters and store the results */ retval = PAPI_stop_counters(values,NUM_EVENTS); Return codes Name PAPI_OK PAPI_EINVAL PAPI_ENOMEM PAPI_ESYS PAPI_ESBSTR PAPI_ECLOST PAPI_EBUG PAPI_ENOEVNT PAPI_ECNFLCT PAPI_ENOTRUN PAPI_EISRUN PAPI_ENOEVST PAPI_ENOTPRESET PAPI_ENOCNTR PAPI_EMISC Description No error Invalid argument Insufficient memory A system/c library call failed. Check errno variable Substrate returned an error. E.g. unimplemented feature Access to the counters was lost or interrupted Internal error Hardware event does not exist Hardware event exists, but resources are exhausted Event or envent set is currently counting Events or event set is currently running No event set available Argument is not a preset Hardware does not support counters Any other error occured 5

6 PAPI Low-level Interface Low-level Interface Increased efficiency and functionality over the high level PAPI interface About 40 functions Obtain information about the executable and the hardware Thread-safe Fully programmable Callbacks on counter overflow 6

7 Low-level Functionality Library initialization PAPI_library_init, PAPI_thread_init, PAPI_shutdown Timing functions PAPI_get_real_usec, PAPI_get_virt_usec PAPI_get_real_cyc, PAPI_get_virt_cyc Inquiry functions Management functions Simple lock PAPI_lock/PAPI_unlock Event sets The event set contains key information What low-level hardware counters to use Most recently read counter values The state of the event set (running/not running) Option settings (e.g., domain, granularity, overflow, profiling) Event sets can overlap if they map to the same hardware counter set-up. Allows inclusive/exclusive measurements 7

8 Event set Operations Event set management PAPI_create_eventset, PAPI_add_event[s], PAPI_rem_event[s], PAPI_destroy_eventset Event set control PAPI_start, PAPI_stop, PAPI_read, PAPI_accum Event set inquiry PAPI_query_event, PAPI_list_events,... Simple Example #include "papi.h #define NUM_EVENTS 2 int Events[NUM_EVENTS]={PAPI_FP_INS,PAPI_TOT_CYC}, EventSet; long_long values[num_events]; /* Initialize the Library */ retval = PAPI_library_init(PAPI_VER_CURRENT); /* Allocate space for the new eventset and do setup */ retval = PAPI_create_eventset(&EventSet); /* Add Flops and total cycles to the eventset */ retval = PAPI_add_events(&EventSet,Events,NUM_EVENTS); /* Start the counters */ retval = PAPI_start(EventSet); do_work(); /* What we want to monitor*/ /*Stop counters and store results in values */ retval = PAPI_stop(EventSet,values); 8

9 Overflow handling generate an overflow signal after every threshold events are counted each counter has to be registered separately the value of each registered hardware counter is maintained separately (LONG_)LONG_MAX: 32 bit: 2,147,483, bit: 9,223,372,036,854,775,807 overflow_handler(): user-defined function to process overflow events. function will be called by the PAPI library every time the threshold is reached overflow_vector: a bit-array that can be processed to determined which event(s) caused the overflow e.g. using PAPI_get_overflow_event_index() Software vs. hardware overflow: if processor does not support hardware overflow, software emulates it be periodically checking the counter values software overflow handling inaccurate and more expensive than hardware handling often implemented using a zero-crossing algorithm value of counter is set to threshold and increased accordingly 9

10 1 st Assignment Rules Each student should deliver Source code (.c and.h files) Please: no.o files and no executables! Documentation (pdf or docx formats accepts) Deliver electronically on blackboard Expected by Tuesday, March 6, 11.59pm In case of questions: Ask early, not the day before the submission is due About the Project Given the source code for matrix-multiply operation( File hwmatmul.c). The code contains a trivial implementation of the matrix multiply operation and a blocked implementation The blocked implementation is executed with block sizes of 16, 32, 64 and 128 You can compile the C file, e.g. with cc O3 hw-matmul.c o hw-matmul Once you added the PAPI functions cc o3 hw-matmul.c o hw-matmul -I/opt/papi/5.6.0/include L/opt/papi/5.6.0/lib64 lpapi Run: srun./hw-matmul <matrix-dimension> 10

11 Determine the execution time for the matrix multiply operation separately for each of the 5 versions of the matrix multiply operation (trivial, blocksize 16, blocksize 32, blocksize 64, blocksize 128) for two matrix sizes (512 and 1024) Determine no. of L1 cache misses and L1 cache miss rate separately for all 5 versions of the matrix multiply operation for two matrix sizes Determine no. of L2 cache misses and L2 cache miss rate separately for all 5 versions of the matrix multiply operation for two matrix sizes Determine no. of L3 cache misses and L3 cache miss rate separately for all 5 versions of the matrix multiply operation for two matrix sizes Add the required function calls to the PAPI library into the code to determine properties of the trivial and of the blocked implementation for different block sizes Provide measurements for matrixes of size 512 and 1024 on the whale cluster Note, that for development purposes you can run the code of course with much smaller matrices, e.g. 64 Compare the numbers obtained both, between the different implementations (e.g. blocksize x has higher cache miss rate than blocksize y, but execution time is highest with blocksize z), and matrix sizes (increasing the matrix size from 512 to 1024 increased the cache miss rate by a factor of k for blocksize x) Determine and document the cache hierarchy, sizes and characteristics of the processors used on the whale cluster (Note: you can do that using PAPI) 11

12 It is ok to submit different code for determining time, L1 cache behavior, L2 cache behavior and L3 cache behavior Make sure you run your tests multiple times, and document how often you run it, whether you show average, minimum, maximum etc. Comment on your findings on how the parameter values change with the block sizes for each matrix size. Both graphs and tables are ok to discuss your results Notes The PAPI version installed on whale is On the front-end node you can find tons ton s of examples in C and Fortran on how to use PAPI in /opt/papi/5.6.0/ctests E.g. low-level.c -> how to use the low-level API of PAPI high-level.c -> example for high-level API memory.c -> how to extract information of the memory subsystem (e.g. cache sizes) overflow_index.c -> how to handle overflow correctly For compiling one of these examples: gcc -o high-level high-level.c -I/opt/papi/5.6.0/include -L/opt/papi/5.6.0/lib64/ -lpapi -ltestlib 12

13 1 st Assignment The Documentation should contain (Brief) Problem description Solution strategy Results section Description of resources used Description of measurements performed Results (graphs + findings) 1 st Assignment The document should not contain Replication of the entire source code that s why you have to deliver the sources Screen shots of every single measurement you made Actually, no screen shots at all. The slurm output files 13

14 How to use a cluster A cluster usually consists of a front-end node and compute nodes You can login to the front end node using ssh (from windows or linux machines) using the login name and the password assigned to you. The front end node is supposed to be there for editing, and compiling - not for running jobs! To run your job interactively development: smith@whale:~>srun./hw-matmul 64 How to use a cluster (II) Once your code is correct and you would like to do the measurements: You have to submit a batch job The command you need is sbatch, e.g. sbatch N 1./measurement.sh Your job goes into a queue, and will be executed as soon as a node is available. You can check the status of your job with smith@whale:~>squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 489 whale smith R 0:02 1 whale

15 How to use a cluster (III) The output of squeue gives you a job-id for your job Once your job finishes, you will have a file called slurm- <jobid>.out in your home directory, which contains all the output of your printf statements etc. Note the batch script used for the job submission (e.g. measurements.sh) has to be executable. This means, that after you downloaded it from the webpage and copied it to whale, you have to type chmod +x measurements.sh Please do not edit the measurements.sh file on MS Windows. Windows does not add the UNIX EOF markers, and this confuses slurm when reading the file. Notes PAPI Documentation: If you need hints on how to use a UNIX/Linux machine through ssh: How to use a cluster such as whale/crill Please use crill documentation for this reference, since we are operating for this assignment the whale cluster as an HPC cluster, not Hadoop! 15

16 16

COSC 6385 Computer Architecture. - Homework

COSC 6385 Computer Architecture. - Homework COSC 6385 Computer Architecture - Homework Fall 2008 1 st Assignment Rules Each team should deliver Source code (.c,.h and Makefiles files) Please: no.o files and no executables! Documentation (.pdf,.doc,.tex

More information

Dresden, September Dan Terpstra Jack Dongarra Shirley Moore. Heike Jagode

Dresden, September Dan Terpstra Jack Dongarra Shirley Moore. Heike Jagode Collecting Performance Data with PAPI-C 3rd Parallel Tools Workshop 3rd Parallel Tools Workshop Dresden, September 14-15 Dan Terpstra Jack Dongarra Shirley Moore Haihang You Heike Jagode Hardware performance

More information

PAPI - PERFORMANCE API. ANDRÉ PEREIRA

PAPI - PERFORMANCE API. ANDRÉ PEREIRA PAPI - PERFORMANCE API ANDRÉ PEREIRA ampereira@di.uminho.pt 1 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) It is enough to identify bottlenecks,

More information

PAPI - PERFORMANCE API. ANDRÉ PEREIRA

PAPI - PERFORMANCE API. ANDRÉ PEREIRA PAPI - PERFORMANCE API ANDRÉ PEREIRA ampereira@di.uminho.pt 1 Motivation 2 Motivation Application and functions execution time is easy to measure time gprof valgrind (callgrind) 2 Motivation Application

More information

SCIENTIFIC COMPUTING FOR ENGINEERS

SCIENTIFIC COMPUTING FOR ENGINEERS 4/26/16 CS 594: SCIENTIFIC COMPUTING FOR ENGINEERS PAPI Performance Application Programming Interface Heike Jagode jagode@icl.utk.edu OUTLINE 1. Motivation What is Performance? Why being annoyed with Performance

More information

PAPI Software Specification

PAPI Software Specification PAPI Software Specification This software specification describes the PAPI 3.0 Release, and is current as of March 08, 2004. It consists of the following sections: Introduction to PAPI Constants Standardized

More information

PAPI Performance Application Programming Interface (adapted by Fengguang Song)

PAPI Performance Application Programming Interface (adapted by Fengguang Song) 1/17/18 PAPI Performance Application Programming Interface (adapted by Fengguang Song) Heike McCraw mccraw@icl.utk.edu To get more details, please read the manual: http://icl.cs.utk.edu/projects/papi/wiki/papi3:

More information

COSC 6374 Parallel Computation. Edgar Gabriel Fall Each student should deliver Source code (.c file) Documentation (.pdf,.doc,.tex or.

COSC 6374 Parallel Computation. Edgar Gabriel Fall Each student should deliver Source code (.c file) Documentation (.pdf,.doc,.tex or. COSC 6374 Parallel Computation 1 st homework assignment Edgar Gabriel Fall 2015 1 st Homework Rules Each student should deliver Source code (.c file) Documentation (.pdf,.doc,.tex or.txt file) explanations

More information

PAPI Programmer s Reference

PAPI Programmer s Reference PAPI Programmer s Reference This document is a compilation of the reference material needed by a programmer to effectively use PAPI. It is identical to the material found in the PAPI man pages, but organized

More information

PAPI Programmer s Reference

PAPI Programmer s Reference PAPI Programmer s Reference This document is a compilation of the reference material needed by a programmer to effectively use PAPI. It is identical to the material found in the PAPI man pages, but organized

More information

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008

COSC 6374 Parallel Computation. Debugging MPI applications. Edgar Gabriel. Spring 2008 COSC 6374 Parallel Computation Debugging MPI applications Spring 2008 How to use a cluster A cluster usually consists of a front-end node and compute nodes Name of the front-end node: shark.cs.uh.edu You

More information

Outline. Execution Environments for Parallel Applications. OS Abstractions

Outline. Execution Environments for Parallel Applications. OS Abstractions Outline Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Supercomputers OS abstractions Extended OS

More information

Parallelism V. HPC Profiling. John Cavazos. Dept of Computer & Information Sciences University of Delaware

Parallelism V. HPC Profiling. John Cavazos. Dept of Computer & Information Sciences University of Delaware Parallelism V HPC Profiling John Cavazos Dept of Computer & Information Sciences University of Delaware Lecture Overview Performance Counters Profiling PAPI TAU HPCToolkit PerfExpert Performance Counters

More information

Execution Environments for Parallel Applications

Execution Environments for Parallel Applications Execution Environments for Parallel Applications Master CANS 2007/2008 Departament d Arquitectura de Computadors Universitat Politècnica de Catalunya Execution Environments for Parallel Applications Outline

More information

PAPI Performance API. Shirley Moore 8th VI-HPS Tuning Workshop 5-9 September 2011

PAPI Performance API. Shirley Moore 8th VI-HPS Tuning Workshop 5-9 September 2011 PAPI Performance API Shirley Moore shirley@eecs.utk.edu 8th VI-HPS Tuning Workshop 5-9 September 2011 PAPI Team Vince Weaver Post Doc Kiran Kasichayanula Masters Student Jack Dongarra Distinguished Prof.

More information

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU

Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU Introduction to Joker Cyber Infrastructure Architecture Team CIA.NMSU.EDU What is Joker? NMSU s supercomputer. 238 core computer cluster. Intel E-5 Xeon CPUs and Nvidia K-40 GPUs. InfiniBand innerconnect.

More information

PAPI Users Group Meeting SC2003

PAPI Users Group Meeting SC2003 PAPI Users Group Meeting SC2003 Philip Mucci, mucci@cs.utk.edu Felix Wolf, fwolf@cs.utk.edu Nils Smeds, smeds@pdc.kth.se Tuesday, November 18 th Phoenix, AZ Agenda CVS Web Structure 2.3.4 Bugs 2.3.5 Release

More information

Visualization Mathematica Graphics

Visualization Mathematica Graphics Table of Contents 304 Motivation & Trends in HPC R&D Projects @ PP Mathematical Modeling Numerical Methods used in HPSC Systems of Differential Equations: ODEs & PDEs Automatic Differentiation Solving

More information

General Purpose Timing Library (GPTL)

General Purpose Timing Library (GPTL) General Purpose Timing Library (GPTL) A tool for characterizing parallel and serial application performance Jim Rosinski Outline Existing tools Motivation API and usage examples PAPI interface auto-profiling

More information

PAPI: Performance API

PAPI: Performance API Santiago 2015 PAPI: Performance API Andrés Ávila Centro de Modelación y Computación Científica Universidad de La Frontera andres.avila@ufrontera.cl October 27th, 2015 1 Motivation 2 Motivation PERFORMANCE

More information

Tau Introduction. Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY. March 13, 2009

Tau Introduction. Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY. March 13, 2009 Tau Introduction Lars Koesterke (& Kent Milfeld, Sameer Shende) Cornell University Ithaca, NY March 13, 2009 General Outline Measurements Instrumentation & Control Example: matmult Profiling and Tracing

More information

PAPI USER S GUIDE...1

PAPI USER S GUIDE...1 PAPI USER S GUIDE TABLE OF CONTENTS PAPI USER S GUIDE...1 TABLE OF CONTENTS...1 PREFACE...4 INTENDED AUDIENCE...4 ORGANIZATION OF THIS DOCUMENT...4 INTRODUCTION TO PAPI......4 INSTALLING PAPI...4 C AND

More information

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides)

STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2. (Mouse over to the left to see thumbnails of all of the slides) STARTING THE DDT DEBUGGER ON MIO, AUN, & MC2 (Mouse over to the left to see thumbnails of all of the slides) ALLINEA DDT Allinea DDT is a powerful, easy-to-use graphical debugger capable of debugging a

More information

Working with Shell Scripting. Daniel Balagué

Working with Shell Scripting. Daniel Balagué Working with Shell Scripting Daniel Balagué Editing Text Files We offer many text editors in the HPC cluster. Command-Line Interface (CLI) editors: vi / vim nano (very intuitive and easy to use if you

More information

The PAPI Cross-Platform Interface to Hardware Performance Counters

The PAPI Cross-Platform Interface to Hardware Performance Counters The PAPI Cross-Platform Interface to Hardware Performance Counters Kevin London, Shirley Moore, Philip Mucci, and Keith Seymour University of Tennessee-Knoxville {london, shirley, mucci, seymour}@cs.utk.edu

More information

To connect to the cluster, simply use a SSH or SFTP client to connect to:

To connect to the cluster, simply use a SSH or SFTP client to connect to: RIT Computer Engineering Cluster The RIT Computer Engineering cluster contains 12 computers for parallel programming using MPI. One computer, cluster-head.ce.rit.edu, serves as the master controller or

More information

CSE 120 Principles of Operating Systems

CSE 120 Principles of Operating Systems CSE 120 Principles of Operating Systems Spring 2018 Lecture 15: Multicore Geoffrey M. Voelker Multicore Operating Systems We have generally discussed operating systems concepts independent of the number

More information

For Dr Landau s PHYS8602 course

For Dr Landau s PHYS8602 course For Dr Landau s PHYS8602 course Shan-Ho Tsai (shtsai@uga.edu) Georgia Advanced Computing Resource Center - GACRC January 7, 2019 You will be given a student account on the GACRC s Teaching cluster. Your

More information

Tools and techniques for optimization and debugging. Fabio Affinito October 2015

Tools and techniques for optimization and debugging. Fabio Affinito October 2015 Tools and techniques for optimization and debugging Fabio Affinito October 2015 Profiling Why? Parallel or serial codes are usually quite complex and it is difficult to understand what is the most time

More information

Programming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018

Programming Techniques for Supercomputers. HPC RRZE University Erlangen-Nürnberg Sommersemester 2018 Programming Techniques for Supercomputers HPC Services @ RRZE University Erlangen-Nürnberg Sommersemester 2018 Outline Login to RRZE s Emmy cluster Basic environment Some guidelines First Assignment 2

More information

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

LAB. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers LAB Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012 1 Discovery

More information

Slurm basics. Summer Kickstart June slide 1 of 49

Slurm basics. Summer Kickstart June slide 1 of 49 Slurm basics Summer Kickstart 2017 June 2017 slide 1 of 49 Triton layers Triton is a powerful but complex machine. You have to consider: Connecting (ssh) Data storage (filesystems and Lustre) Resource

More information

Whitepaper: Software-Defined Events (SDEs) in MAGMA-Sparse

Whitepaper: Software-Defined Events (SDEs) in MAGMA-Sparse Whitepaper: Software-Defined Events (SDEs) in MAGMA-Sparse ECP PEEKS: Production-ready, Exascale-Enabled, Krylov Solvers for Exascale Computing Heike Jagode Anthony Danalis Hartwig Anzt Ichitaro Yamazaki

More information

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009

Profiling and Debugging Tools. Lars Koesterke University of Porto, Portugal May 28-29, 2009 Profiling and Debugging Tools Lars Koesterke University of Porto, Portugal May 28-29, 2009 Outline General (Analysis Tools) Listings & Reports Timers Profilers (gprof, tprof, Tau) Hardware performance

More information

Operating Systems (ECS 150) Spring 2011

Operating Systems (ECS 150) Spring 2011 Operating Systems (ECS 150) Spring 2011 Raju Pandey Department of Computer Science University of California, Davis CA 95616 pandey@cs.ucdavis.edu http://www.cs.ucdavis.edu/~pandey Course Objectives After

More information

ECE 574 Cluster Computing Lecture 4

ECE 574 Cluster Computing Lecture 4 ECE 574 Cluster Computing Lecture 4 Vince Weaver http://web.eece.maine.edu/~vweaver vincent.weaver@maine.edu 31 January 2017 Announcements Don t forget about homework #3 I ran HPCG benchmark on Haswell-EP

More information

Graham vs legacy systems

Graham vs legacy systems New User Seminar Graham vs legacy systems This webinar only covers topics pertaining to graham. For the introduction to our legacy systems (Orca etc.), please check the following recorded webinar: SHARCNet

More information

P a g e 1. HPC Example for C with OpenMPI

P a g e 1. HPC Example for C with OpenMPI P a g e 1 HPC Example for C with OpenMPI Revision History Version Date Prepared By Summary of Changes 1.0 Jul 3, 2017 Raymond Tsang Initial release 1.1 Jul 24, 2018 Ray Cheung Minor change HPC Example

More information

Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters

Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters Experiences and Lessons Learned with a Portable Interface to Hardware Performance Counters Jack Dongarra Kevin London Shirley Moore Philip Mucci Daniel Terpstra Haihang You Min Zhou University of Tennessee

More information

Using a Linux System 6

Using a Linux System 6 Canaan User Guide Connecting to the Cluster 1 SSH (Secure Shell) 1 Starting an ssh session from a Mac or Linux system 1 Starting an ssh session from a Windows PC 1 Once you're connected... 1 Ending an

More information

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro

INTRODUCTION TO GPU COMPUTING WITH CUDA. Topi Siro INTRODUCTION TO GPU COMPUTING WITH CUDA Topi Siro 19.10.2015 OUTLINE PART I - Tue 20.10 10-12 What is GPU computing? What is CUDA? Running GPU jobs on Triton PART II - Thu 22.10 10-12 Using libraries Different

More information

COSC 6374 Parallel Computation. Performance Oriented Software Design. Edgar Gabriel. Spring Amdahl s Law

COSC 6374 Parallel Computation. Performance Oriented Software Design. Edgar Gabriel. Spring Amdahl s Law COSC 6374 Parallel Computation Performance Oriented Software Design Spring 2008 Amdahl s Law Describes the performance gains by enhancing one part of the overall system (code, computer) Speedup = Performance

More information

High-Performance Scientific Computing

High-Performance Scientific Computing High-Performance Scientific Computing Instructor: Randy LeVeque TA: Grady Lemoine Applied Mathematics 483/583, Spring 2011 http://www.amath.washington.edu/~rjl/am583 World s fastest computers http://top500.org

More information

CRUK cluster practical sessions (SLURM) Part I processes & scripts

CRUK cluster practical sessions (SLURM) Part I processes & scripts CRUK cluster practical sessions (SLURM) Part I processes & scripts login Log in to the head node, clust1-headnode, using ssh and your usual user name & password. SSH Secure Shell 3.2.9 (Build 283) Copyright

More information

Prof. Thomas Sterling

Prof. Thomas Sterling High Performance Computing: Concepts, Methods & Means Performance Measurement 1 Prof. Thomas Sterling Department of Computer Science Louisiana i State t University it February 13 th, 2007 News Alert! Intel

More information

Day 15: Science Code in Python

Day 15: Science Code in Python Day 15: Science Code in Python 1 Turn In Homework 2 Homework Review 3 Science Code in Python? 4 Custom Code vs. Off-the-Shelf Trade-offs Costs (your time vs. your $$$) Your time (coding vs. learning) Control

More information

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018

Linux Tutorial. Ken-ichi Nomura. 3 rd Magics Materials Software Workshop. Gaithersburg Marriott Washingtonian Center November 11-13, 2018 Linux Tutorial Ken-ichi Nomura 3 rd Magics Materials Software Workshop Gaithersburg Marriott Washingtonian Center November 11-13, 2018 Wireless Network Configuration Network Name: Marriott_CONFERENCE (only

More information

HPC Introductory Course - Exercises

HPC Introductory Course - Exercises HPC Introductory Course - Exercises The exercises in the following sections will guide you understand and become more familiar with how to use the Balena HPC service. Lines which start with $ are commands

More information

Cache Lab Implementation and Blocking

Cache Lab Implementation and Blocking Cache Lab Implementation and Blocking Lou Clark February 24 th, 2014 1 Welcome to the World of Pointers! 2 Class Schedule Cache Lab Due Thursday. Start soon if you haven t yet! Exam Soon! Start doing practice

More information

Exercise: Calling LAPACK

Exercise: Calling LAPACK Exercise: Calling LAPACK In this exercise, we ll use the same conventions and commands as in the batch computing exercise. You should refer back to the batch computing exercise description for detai on

More information

Introduction to GACRC Teaching Cluster

Introduction to GACRC Teaching Cluster Introduction to GACRC Teaching Cluster Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three Folders

More information

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine

Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Batch Systems & Parallel Application Launchers Running your jobs on an HPC machine Partners Funding Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

Table of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine

Table of Contents. Table of Contents Job Manager for remote execution of QuantumATK scripts. A single remote machine Table of Contents Table of Contents Job Manager for remote execution of QuantumATK scripts A single remote machine Settings Environment Resources Notifications Diagnostics Save and test the new machine

More information

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose

Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer. Purpose Introduction to High Performance Computing and an Statistical Genetics Application on the Janus Supercomputer Daniel Yorgov Department of Mathematical & Statistical Sciences, University of Colorado Denver

More information

Our new HPC-Cluster An overview

Our new HPC-Cluster An overview Our new HPC-Cluster An overview Christian Hagen Universität Regensburg Regensburg, 15.05.2009 Outline 1 Layout 2 Hardware 3 Software 4 Getting an account 5 Compiling 6 Queueing system 7 Parallelization

More information

SCALABLE HYBRID PROTOTYPE

SCALABLE HYBRID PROTOTYPE SCALABLE HYBRID PROTOTYPE Scalable Hybrid Prototype Part of the PRACE Technology Evaluation Objectives Enabling key applications on new architectures Familiarizing users and providing a research platform

More information

Performance Analysis of Parallel Scientific Applications In Eclipse

Performance Analysis of Parallel Scientific Applications In Eclipse Performance Analysis of Parallel Scientific Applications In Eclipse EclipseCon 2015 Wyatt Spear, University of Oregon wspear@cs.uoregon.edu Supercomputing Big systems solving big problems Performance gains

More information

Submitting batch jobs

Submitting batch jobs Submitting batch jobs SLURM on ECGATE Xavi Abellan Xavier.Abellan@ecmwf.int ECMWF February 20, 2017 Outline Interactive mode versus Batch mode Overview of the Slurm batch system on ecgate Batch basic concepts

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2013 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2013 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

ReFrame: A Regression Testing Framework Enabling Continuous Integration of Large HPC Systems

ReFrame: A Regression Testing Framework Enabling Continuous Integration of Large HPC Systems ReFrame: A Regression Testing Framework Enabling Continuous Integration of Large HPC Systems HPC Advisory Council 2018 Victor Holanda, Vasileios Karakasis, CSCS Apr. 11, 2018 ReFrame in a nutshell Regression

More information

High Performance Computing Cluster Basic course

High Performance Computing Cluster Basic course High Performance Computing Cluster Basic course Jeremie Vandenplas, Gwen Dawes 30 October 2017 Outline Introduction to the Agrogenomics HPC Connecting with Secure Shell to the HPC Introduction to the Unix/Linux

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research

Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Introduction to SLURM on the High Performance Cluster at the Center for Computational Research Cynthia Cornelius Center for Computational Research University at Buffalo, SUNY 701 Ellicott St Buffalo, NY

More information

Processes and Threads

Processes and Threads OPERATING SYSTEMS CS3502 Spring 2018 Processes and Threads (Chapter 2) Processes Two important types of dynamic entities in a computer system are processes and threads. Dynamic entities only exist at execution

More information

Batch Systems. Running your jobs on an HPC machine

Batch Systems. Running your jobs on an HPC machine Batch Systems Running your jobs on an HPC machine Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Blue Gene/P Universal Performance Counters

Blue Gene/P Universal Performance Counters Blue Gene/P Universal Performance Counters Bob Walkup (walkup@us.ibm.com) 256 counters, 64 bits each; hardware unit on the BG/P chip 72 counters are in the clock-x1 domain (ppc450 core: fpu, fp load/store,

More information

Module 4: Working with MPI

Module 4: Working with MPI Module 4: Working with MPI Objective Learn how to develop, build and launch a parallel (MPI) program on a remote parallel machine Contents Remote project setup Building with Makefiles MPI assistance features

More information

Introduction to GACRC Teaching Cluster PHYS8602

Introduction to GACRC Teaching Cluster PHYS8602 Introduction to GACRC Teaching Cluster PHYS8602 Georgia Advanced Computing Resource Center (GACRC) EITS/University of Georgia Zhuofei Hou zhuofei@uga.edu 1 Outline GACRC Overview Computing Resources Three

More information

Performance Analysis Tools

Performance Analysis Tools Performance Analysis Tools fuerling@eecs.berkeley.edu With slides from David Skinner, Sameer Shende, Shirley Moore, Bernd Mohr, Felix Wolf, Hans Christian Hoppe and others. Outline Motivation Why do we

More information

EECS2031 Winter Software Tools. Assignment 1 (15%): Shell Programming

EECS2031 Winter Software Tools. Assignment 1 (15%): Shell Programming EECS2031 Winter 2018 Software Tools Assignment 1 (15%): Shell Programming Due Date: 11:59 pm on Friday, Feb 9, 2018 Objective In this assignment, you will be writing four shell programs. The first program

More information

Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers

Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers Bright Cluster Manager: Using the NVIDIA NGC Deep Learning Containers Technical White Paper Table of Contents Pre-requisites...1 Setup...2 Run PyTorch in Kubernetes...3 Run PyTorch in Singularity...4 Run

More information

Delft3d-FLOW Quick Start Manual

Delft3d-FLOW Quick Start Manual Delft3d-FLOW Quick Start Manual Michael Kliphuis April 2, 2018 1. Introduction Delft3D-FLOW is a multi-dimensional (2D or 3D) hydrodynamic (and transport) simulation program which calculates non-steady

More information

MATRIX:DJLSYS EXPLORING RESOURCE ALLOCATION TECHNIQUES FOR DISTRIBUTED JOB LAUNCH UNDER HIGH SYSTEM UTILIZATION

MATRIX:DJLSYS EXPLORING RESOURCE ALLOCATION TECHNIQUES FOR DISTRIBUTED JOB LAUNCH UNDER HIGH SYSTEM UTILIZATION MATRIX:DJLSYS EXPLORING RESOURCE ALLOCATION TECHNIQUES FOR DISTRIBUTED JOB LAUNCH UNDER HIGH SYSTEM UTILIZATION XIAOBING ZHOU(xzhou40@hawk.iit.edu) HAO CHEN (hchen71@hawk.iit.edu) Contents Introduction

More information

1 Bull, 2011 Bull Extreme Computing

1 Bull, 2011 Bull Extreme Computing 1 Bull, 2011 Bull Extreme Computing Table of Contents Overview. Principal concepts. Architecture. Scheduler Policies. 2 Bull, 2011 Bull Extreme Computing SLURM Overview Ares, Gerardo, HPC Team Introduction

More information

CS 3204 Operating Systems Programming Project #2 Job / CPU Scheduling Dr. Sallie Henry Spring 2001 Due on February 27, 2001.

CS 3204 Operating Systems Programming Project #2 Job / CPU Scheduling Dr. Sallie Henry Spring 2001 Due on February 27, 2001. CS 3204 Operating Systems Programming Project #2 Job / CPU Scheduling Dr. Sallie Henry Spring 2001 Due on February 27, 2001. 23:59:59 PM Design and implement a program that simulates some of the job scheduling,

More information

Choosing Resources Wisely. What is Research Computing?

Choosing Resources Wisely. What is Research Computing? Choosing Resources Wisely Scott Yockel, PhD Harvard - Research Computing What is Research Computing? Faculty of Arts and Sciences (FAS) department that handles nonenterprise IT requests from researchers.

More information

6.1 Multiprocessor Computing Environment

6.1 Multiprocessor Computing Environment 6 Parallel Computing 6.1 Multiprocessor Computing Environment The high-performance computing environment used in this book for optimization of very large building structures is the Origin 2000 multiprocessor,

More information

Beginner's Guide for UK IBM systems

Beginner's Guide for UK IBM systems Beginner's Guide for UK IBM systems This document is intended to provide some basic guidelines for those who already had certain programming knowledge with high level computer languages (e.g. Fortran,

More information

XSEDE New User Training. Ritu Arora November 14, 2014

XSEDE New User Training. Ritu Arora   November 14, 2014 XSEDE New User Training Ritu Arora Email: rauta@tacc.utexas.edu November 14, 2014 1 Objectives Provide a brief overview of XSEDE Computational, Visualization and Storage Resources Extended Collaborative

More information

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core

Bindel, Fall 2011 Applications of Parallel Computers (CS 5220) Tuning on a single core Tuning on a single core 1 From models to practice In lecture 2, we discussed features such as instruction-level parallelism and cache hierarchies that we need to understand in order to have a reasonable

More information

CPSC 341 OS & Networks. Introduction. Dr. Yingwu Zhu

CPSC 341 OS & Networks. Introduction. Dr. Yingwu Zhu CPSC 341 OS & Networks Introduction Dr. Yingwu Zhu What to learn? Concepts Processes, threads, multi-processing, multithreading, synchronization, deadlocks, CPU scheduling, networks, security Practice:

More information

What is an Operating System? A Whirlwind Tour of Operating Systems. How did OS evolve? How did OS evolve?

What is an Operating System? A Whirlwind Tour of Operating Systems. How did OS evolve? How did OS evolve? What is an Operating System? A Whirlwind Tour of Operating Systems Trusted software interposed between the hardware and application/utilities to improve efficiency and usability Most computing systems

More information

Matrix Multiplication

Matrix Multiplication Matrix Multiplication CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Matrix Multiplication Spring 2018 1 / 32 Outline 1 Matrix operations Importance Dense and sparse

More information

Homework 1 Due Monday April 24, 2017, 11 PM

Homework 1 Due Monday April 24, 2017, 11 PM CME 213 Spring 2017 1/6 Homework 1 Due Monday April 24, 2017, 11 PM In this programming assignment you will implement Radix Sort, and will learn about OpenMP, an API which simplifies parallel programming

More information

Open Multi-Processing: Basic Course

Open Multi-Processing: Basic Course HPC2N, UmeåUniversity, 901 87, Sweden. May 26, 2015 Table of contents Overview of Paralellism 1 Overview of Paralellism Parallelism Importance Partitioning Data Distributed Memory Working on Abisko 2 Pragmas/Sentinels

More information

Compilation and Parallel Start

Compilation and Parallel Start Compiling MPI Programs Programming with MPI Compiling and running MPI programs Type to enter text Jan Thorbecke Delft University of Technology 2 Challenge the future Compiling and Starting MPI Jobs Compiling:

More information

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing

THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June COMP3320/6464/HONS High Performance Scientific Computing THE AUSTRALIAN NATIONAL UNIVERSITY First Semester Examination June 2010 COMP3320/6464/HONS High Performance Scientific Computing Study Period: 15 minutes Time Allowed: 3 hours Permitted Materials: Non-Programmable

More information

82V391x / 8V893xx WAN PLL Device Families Device Driver User s Guide

82V391x / 8V893xx WAN PLL Device Families Device Driver User s Guide 82V391x / 8V893xx WAN PLL Device Families Device Driver Version 1.2 April 29, 2014 Table of Contents 1. Introduction... 1 2. Software Architecture... 2 2.1. Overview... 2 2.2. Hardware Abstraction Layer

More information

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018

MPI 1. CSCI 4850/5850 High-Performance Computing Spring 2018 MPI 1 CSCI 4850/5850 High-Performance Computing Spring 2018 Tae-Hyuk (Ted) Ahn Department of Computer Science Program of Bioinformatics and Computational Biology Saint Louis University Learning Objectives

More information

Lab 1 Introduction to UNIX and C

Lab 1 Introduction to UNIX and C Name: Lab 1 Introduction to UNIX and C This first lab is meant to be an introduction to computer environments we will be using this term. You must have a Pitt username to complete this lab. The doc is

More information

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015

Lecture 10: Cache Coherence: Part I. Parallel Computer Architecture and Programming CMU /15-618, Spring 2015 Lecture 10: Cache Coherence: Part I Parallel Computer Architecture and Programming CMU 15-418/15-618, Spring 2015 Tunes Marble House The Knife (Silent Shout) Before starting The Knife, we were working

More information

PSE Molekulardynamik

PSE Molekulardynamik OpenMP, Multicore Architectures, PAPI 21.12.2012 Outline Schedule Presentations: Worksheet 4 OpenMP Multicore Architectures PAPI Membrane, Crystallization Preparation: Worksheet 5 2 Schedule (big meetings)

More information

Performance analysis basics

Performance analysis basics Performance analysis basics Christian Iwainsky Iwainsky@rz.rwth-aachen.de 25.3.2010 1 Overview 1. Motivation 2. Performance analysis basics 3. Measurement Techniques 2 Why bother with performance analysis

More information

Parallel Performance and Optimization

Parallel Performance and Optimization Parallel Performance and Optimization Erik Schnetter Gregory G. Howes Iowa High Performance Computing Summer School University of Iowa Iowa City, Iowa May 20-22, 2013 Thank you Ben Rogers Glenn Johnson

More information

Vulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics

Vulkan: Scaling to Multiple Threads. Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics Vulkan: Scaling to Multiple Threads Kevin sun Lead Developer Support Engineer, APAC PowerVR Graphics www.imgtec.com Introduction Who am I? Kevin Sun Working at Imagination Technologies Take responsibility

More information

Introduction to High-Performance Computing (HPC)

Introduction to High-Performance Computing (HPC) Introduction to High-Performance Computing (HPC) Computer components CPU : Central Processing Unit cores : individual processing units within a CPU Storage : Disk drives HDD : Hard Disk Drive SSD : Solid

More information

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing

Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 FAS Research Computing Choosing Resources Wisely Plamen Krastev Office: 38 Oxford, Room 117 Email:plamenkrastev@fas.harvard.edu Objectives Inform you of available computational resources Help you choose appropriate computational

More information

Chapter 2 Operating-System Structures

Chapter 2 Operating-System Structures This chapter will discuss the following concepts: 2.1 Operating System Services 2.2 User Operating System Interface 2.3 System Calls 2.4 System Programs 2.5 Operating System Design and Implementation 2.6

More information

OPERATING SYSTEM. PREPARED BY : DHAVAL R. PATEL Page 1. Q.1 Explain Memory

OPERATING SYSTEM. PREPARED BY : DHAVAL R. PATEL Page 1. Q.1 Explain Memory Q.1 Explain Memory Data Storage in storage device like CD, HDD, DVD, Pen drive etc, is called memory. The device which storage data is called storage device. E.g. hard disk, floppy etc. There are two types

More information

Parallelism paradigms

Parallelism paradigms Parallelism paradigms Intro part of course in Parallel Image Analysis Elias Rudberg elias.rudberg@it.uu.se March 23, 2011 Outline 1 Parallelization strategies 2 Shared memory 3 Distributed memory 4 Parallelization

More information