Vincent C. Betro, Ph.D. NICS March 6, 2014

Size: px
Start display at page:

Download "Vincent C. Betro, Ph.D. NICS March 6, 2014"

Transcription

1 Vincent C. Betro, Ph.D. NICS March 6, 2014

2 NSF Acknowledgement This material is based upon work supported by the National Science Foundation under Grant Number Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

3 Programming Models Native Mode Everything runs on the MIC May have issues with libraries not existing, needing copied over (e.g., MKL) Offload Mode Serial portion runs on host Parallel portions are offloaded and run on the MIC

4 Offload Mode Code starts running on host and regions designated to be offloaded via pragmas are run on the MIC card when encountered The host CPU and the MIC cards do not share memory in hardware Data is passed to and from the MIC card explicitly or implicitly (C/C++ only) C/C++ Syntax #pragma offload <clauses> <statement> Fortran Syntax!dir$ offload <clauses> <statement> The statement immediately following the offload pragma/directive will be run on a coprocessor

5 I/O Proxies Standard I/O calls are proxied from the MIC card to the host Examples: Opening files or displaying output to the console, while called by the MIC card, is actually performed by the host I/O can be performed from an offload section Requires an NSF mounted file system, the file pointer to be passed into the offload clause through nocopy, and permissions on the file to be +xrw for micuser

6 Marking Variables/Functions for use on MIC In offload mode, the compiler needs to know ahead of time which functions will run on the MIC C/C++ Syntax Fortran Syntax attribute ((target(mic)))!dir$ attributes offload:<mic> :: <rtn-name> Also any variables that are to exist on both the host and the MIC need to be known by the compiler as well (C++ only, since virtual memory used) C/C++ Syntax #pragma offload_attribute(push, target(mic)) #pragma offload_attribute(pop)

7 Explicit Copy Programmer identifies the variables that need copying to and from the card in the offload directive C/C++ Example: #pragma offload target(mic) in(data:length(size)) Fortran Example:!dir$ offload target(mic) in(data:length(size)) Variables and pointers to be copied are restricted to scalars, structs of scalars, and arrays of scalars i.e. double *var is allowed, but not double **var.

8 Explicit Copy Clauses and Modifiers Clauses / Modifiers Syntax Semantics Target specification target( name[:card_number] ) Where to run construct Conditional offload if (condition) Boolean expression Inputs in(var-list modifiers opt ) Copy from host to coprocessor Outputs out(var-list modifiers opt ) Copy from coprocessor to host Inputs & outputs inout(var-list modifiers opt ) Copy host to coprocessor and back when offload completes Non-copied data nocopy(var-list modifiers opt ) Data is local to target Modifiers Specify pointer length length(element-count-expr) Copy N elements of the pointer s type Control pointer memory allocation Control freeing of pointer memory Control target data alignment alloc_if ( condition ) free_if ( condition ) align ( expression ) Allocate memory to hold data referenced by pointer if condition is TRUE Free memory used by pointer if condition is TRUE Specify minimum memory alignment on target

9 Implicit Copy This method is available only in C/C++ Sections of memory are maintained at the same virtual address on both the host and the MIC This enables sharing of complex data structures that contain pointers This shared memory is synchronized when entering and exiting an offload call Only modified data is transferred between CPU and MIC

10 Dynamic Memory Allocation Using Implicit Copies Special functions are needed in order to allocate and free dynamic memory for implicit copies _Offload_shared_malloc _Offload_shared_aligned_malloc _Offload_shared_free _Offload_shared_aligned_free

11 The _Shared keyword for Data and Functions What Syntax Semantics Function int _Shared f(int x) return x+1; Versions generated for both CPU and card; may be called from either side Global _Shared int x = 0; Visible on both sides File/Function static static _Shared int x; Visible on both sides, only to code within the file/function Class class _Shared x ; Class methods, members, and and operators are available on both sides Pointer to shared data int _Shared *p; p is local (not shared), can point to shared data A shared pointer int *_Shared p; p is shared; should only point at shared data Entire blocks of code #pragma offload_attribute( push, _Shared)! #pragma offload_attribute(pop) Mark entire files or large blocks of code _Shared using this pragma

12 Offloading using Implicit Copy Rather than using a pragma directive, the keyword _Offload is used when calling a function to be run on the MIC Examples: x = _Offload function(y) x = _Offload_to (card_number) function(y) Note: function needs to be defined using the _Cilk_shared keyword

13 Explicit/Implicit Copy Comparison Language Support Syntax Used for Offload via Explicit Data Copying Fortran, C, C++ (C++ functions may be called, but C++ classes cannot be transferred) Pragmas/Directives: #pragma offload in C/C+ +!dir$ omp offload directive in Fortran Offloads that transfer contiguous blocks of data Offload via Implicit Data Copying C, C++ Keywords: _Shared and _Offload Offloads that transfer all or parts of complex data structures, or many small pieces of data

14 Compiling Instructions For offload mode, no special compiler flag is needed To generate host only code (i.e ignore offload pragmas/directives) use the compiler flag no-offload

15 Running Code that Offloads on Beacon Request a compute node from beacon qsub I A UT-AACE Locate the generated binary and execute the host binary i.e../a.out

16 Useful Environment Variables The following applies only to offload mode execution All environment variables defined on the host are replicated on the MIC in offload mode To modify specific MIC vaules, MIC_ENV_PREFIX must be defined OMP_NUM_THREADS=8 OMP_STACKSIZE=16M MIC_ENV_PREFIX=MIC_ MIC_OMP_NUM_THREADS=96 MIC_OMP_STACKSIZE=4M For csh: setenv ENV_VARIABLE VALUE For sh: export ENV_VARIABLE=VALUE

17 Useful Environment Variables part 2 OFFLOAD_REPORT can be useful when trying to debug code that offloads o OFFLOAD_REPORT=1 o Gives basic information (e.g. CPU time) about whether code blocks marked for offload are running on the host or coprocessor o OFFLOAD_REPORT=2 o Gives detailed information (e.g. CPU time and data transfer) about the offload process Use MIC_HOST_LOG to output traces to a file o MIC_HOST_LOG=~/app/mic.log

18 Using Intel s Math Kernel Library (MKL) in Automatic Offload Mode Currently, only the following functions are automatic offload enabled BLAS:?GEMM,?TRSM,?TRMM,?SYRK, and?herk LAPACK: SGETRF, SPOTRF, and SGEQRF Just call the function and the magic happens behind the scenes #include mkl.h /* Necessary to use service funcs */ /* The following must be run to use auto-offload*/ mkl_mic_enable(); /* Or use MKL_MIC_ENABLE */ float *A, *B, *C; /* Matrices */ sgemm(&transa, &transb, &M, &N, &K, &alpha, A, &LDA, B, &LDB, &beta, C, &LDC);

19 Using Intel s Math Kernel Library (MKL) in Automatic Offload Mode The following environment variables also need to be set: export MKL_MIC_ENABLE=1 export OFFLOAD_DEVICES=0,1 export OFFLOAD_REPORT=2 (if you want to see report) Currently, Intel only supports Automatic offload to Mic0 and Mic1.

20 Using Intel s Math Kernel Library (MKL) in Automatic Offload Mode The row or column size of the matrix must be greater than 2048, or else it is just runs on the host. Row and column sizes need to be multiples of 16 at the moment Compile with the mkl compiler flag

21 Using Intel s Math Kernel Library (MKL) in Automatic Offload Mode Functions to fine-tune automatic offload mkl_mic_set_workdivision(double work_div, int device_num) Specify how much work is done on each device as a number between 0.0 and 1.0. Sum for all devices must add to 1.0, is fixed if does not Only need to set this for N-1 devices Host is device 0 (MKL_MIC_HOST_DEVICE), first MIC is device 1, second is device 2, etc. Setting the division to -1 invokes automatic load balancing (MKL_MIC_AUTO_WORKDIVISION) mkl_mic_get_workdivision(double *work_div, int device_num) Lets you find out how the runtime actually divided the work int mkl_mic_get_device_count() How many MIC cards were detected

22 Offload Transfer This pragma/directive simply transfers variables or arrays to/from the specified target using either all in clauses, or all out clauses Example: o #pragma offload_transfer target(mic:0) in(var_a,var_b,var_c) in(array_1:length(8)) o!dir$ offload_transfer target(mic:0) out(var_a,var_b,var_c) out(array_1:length(8)) The above pragma/directive is synchronous and the next statement is executed (on the host) only after data transfer is complete

23 Using Offload Transfer to Allocate/ Free Memory There may be times where you want to allocate/free memory on the MIC, but without any data transfer so that persistent data can be used between multiple offload calls Allocate memory on mic0 o #pragma offload_transfer target(mic:0) nocopy(array_a:length(8) alloc_if(1) free_if(0)) o!dir$ offload_transfer target(mic:0) nocopy(array_a:length(8) alloc_if(.true.) free_if(.false.)) Use allocated memory on mic0 o #pragma offload target(mic:0) inout(array_a:length(8) alloc_if(0) free_if(0)) o!dir$ offload target(mic:0) inout(array_a:length(8) alloc_if(.false.) free_if(.false.)) Free memory on mic0 o #pragma offload_transfer target(mic:0) nocopy(array_a:length(8) alloc_if(0) free_if(1)) o!dir$ offload_transfer target(mic:0) nocopy(array_a:length(8) alloc_if(.false.) free_if(.true.))

24 Asynchronous Offload with signal()/ wait() The CPU can do work while coprocessor(s) are executing an offload statement/block the signal/wait specifiers are used to denote asynchronous operations o #pragma offload signal(&tag) o #pragma offload wait(&tag) o!dir$ offload signal(tag) o!dir$ offload wait(tag) tag is an integer The offload call with the wait specifier will wait until the previous offload call finishes often used while data is being transferred with offload_transfer

25 Simulataneous Computing using OpenMP Any offloading call blocks until the statement completes To use both the host and MIC simultaneously, multiple threads need to be executed on host One or more threads that contain an offload call Other threads have the host do some work With OpenMP, this achieved using OpenMP task calls

26 OpenMP Task Calls in C/C++ #pragma omp parallel #pragma omp single #pragma omp task #pragma offload target(mic) <various serial code> #pragma omp parallel for for (int i=0; i<limit; i++) <parallel loop body> #pragma omp task <host code or another offload>

27 Intel s OpenMP Pi Offload Example Copy the following Intel file to current working directory cp /opt/intel/composerxe_mic/samples/en_us/c++/ mic_samples/intro_samplec/samplec08.c./ Note that additional sample files are located there Modify it so that it can be compiled and run as a standalone program rename the function to main, and have it return a value of 0 int main() return 0; Compile for offload mode on the MIC (the DEBUG keyword simply prints out the value of Pi in this code) icc offload-build DDEBUG o omp_pi_offload samplec08.c

28 Offload Example Continued Run the binary on a compute node Try setting OFFLOAD_REPORT to 2 and run it again export OFFLOAD_REPORT=2 Now reset the OFFLOAD_REPORT environment variable export OFFLOAD_REPORT=0 Key Points: o using #pragma offload target(mic) to offload the OpenMP parallel for loop o using the environment variables MIC_ENV_PREFIX and MIC_OMP_NUM_THREADS to change the number of OpenMP threads used by the MIC o using the environment variable OFFLOAD_REPORT to see timing/ communication information between the CPU and the MIC

29 Select Offload Examples The offload mode allows select portions of a code to run on the Intel MIC, while the rest of it runs on the host. Ideally, the offload regions are highly parallel What follows is select offload examples, provided by Intel, that demonstrate how to move data to and from the Intel MIC cards Intel has many offload examples located in the following directory /global/opt/intel/composerxe_mic/samples/en_us/c++/ mic_samples/intro_samplec/ They can be copied to a directory of your choice and then compiled with make mic

30 SampleC01 This code computes Pi on the MIC using #pragma offload float pi = 0.0f; int count = 10000; int i; #pragma offload target (mic) for (i=0; i<count; i++) float t = (float)((i+0.5f)/count); pi += 4.0f/(1.0f+t*t); pi /= count; #pragma offload target (mic) runs the very next line (or block of code if braces are used) on the Intel MIC In this case the whole for loop is run on the Intel MIC Note that pi was declared outside of the offload region, and it did not need to be explicitly copied to the MIC since it is a scalar

31 SampleC02 This code initializes 2 arrays on the host, and then has the Intel MIC add the arrays together, and store the result in a third array typedef double T; #define SIZE 1000 #pragma offload_attribute(push, target(mic)) static T in1_02[size]; static T in2_02[size]; static T res_02[size]; #pragma offload_attribute(pop) static void populate_02(t* a, int s); The #pragma offload_attribute(push/pop) pair marks the block of code between them to be used on both the host and the Intel MIC They could have been marked individually with attribute ((target(mic))) Without those statements, the Intel MIC would not be able to see/use the 3 arrays

32 SampleC02 Continued The sum of the 2 arrays is done by the Intel MIC Note that only a single Intel MIC core is used void sample02() int i; populate_02(in1_02, SIZE); populate_02(in2_02, SIZE); #pragma offload target(mic) for (i=0; i<size; i++) res_02[i] = in1_02[i] + in2_02[i];

33 SampleC03 This program is similar to SampleC02, except that it avoids unnecessary data transfer void sample03() int i; populate_03(in1_03, SIZE); populate_03(in2_03, SIZE); #pragma offload target(mic) in(in1_03, in2_03) out(res_03) for (i=0; i<size; i++) res_03[i] = in1_03[i] + in2_03[i]; Previously, all 3 arrays were copied to the card at the start of the offload call, and then copied back at the end of the offload call Now, only the in1_03 and in2_03 arrays are copied to the card, and only the res_03 array is copied back

34 SampleC04 This program is similar to the previous two samples, but now we are dealing with pointers instead of the static arrays directly void sample04() T* p1, *p2; int i, s; populate_04(in1_04, SIZE); populate_04(in2_04, SIZE); p1 = in1_04; p2 = in2_04; s = SIZE; #pragma offload target(mic) in(p1, p2:length(s)) out(res_04) for (i=0; i<s; i++) res_04[i] = p1[i] + p2[i]; Since the length of the pointer is not known, it must be explicitly passed as an argument res_04 is still a static array in this sample

35 SampleC05 This program is like the last except the sum of the arrays, via pointers, is now stored in a pointer to the result array This pointer needs to have its length specified as well Also, the summation now happens in the function get_result() get_result() did not need to be marked with attribute ((target(mic))) because it was called by the host and not by the Intel MIC void sample05() T my_result[size]; populate_05(in1_05, SIZE); populate_05(in2_05, SIZE); get_result(in1_05, in2_05, my_result, SIZE); static void get_result(t* pin1, T* pin2, T* res, int s) int i; #pragma offload target(mic) \ in(pin1, pin2 : length(s)) \ out(res : length(s)) for (i=0; i<s; i++) res[i] = pin1[i] + pin2[i];

36 SampleC07 In this program, an array of data is sent from the host to the Intel MIC in one offload call The array values are then doubled on the MIC in a separate offload call, as long as a MIC card exists #define SIZE 1000 attribute ((target(mic))) int array1[size]; attribute ((target(mic))) int send_array(int* p, int s); attribute ((target(mic))) void compute07(int* out, int size); void sample07() int in_data[16] = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16 ; int out_data[16]; int array_sent = 0; int num_devices; // Check if coprocessor(s) are installed and available num_devices = _Offload_number_of_devices(); #pragma offload target(mic : 0) array_sent = send_array(in_data, 16); #pragma offload target(mic : 0) if(array_sent) out(out_data) compute07(out_data, 16);

37 SampleC07 Continued Reminder, attribute ((target(mic))) makes it so both the host and the Intel MIC can see/use the variable/function The function _Offload_number_of_devices() returns how many Intel MIC cards are available The macro MIC lets you know if the MIC (value of 1) or host (value of 0) is currently evaluating the statements attribute ((target(mic))) int send_array(int* p, int s) int retval; int i; for (i=0; i<s; i++) array1[i] = p[i]; #ifdef MIC retval = 1; #else retval = 0; #endif attribute ((target(mic))) void compute07(int* out, int size) int i; for (i=0; i<size; i++) out[i] = array1[i]*2; // Return 1 if array initialization // was done on target return retval;

38 SampleC08 This program is like SampleC01, except now the Pi calculation is done using an OpenMP for loop on the Intel MIC to utilize the many cores float pi = 0.0f; int count = 10000; int i; #pragma offload target (mic) #pragma omp parallel for reduction(+:pi) for (i=0; i<count; i++) float t = (float)((i+0.5f)/count); pi += 4.0f/(1.0f+t*t); pi /= count;

39 OS Thread Affinity Mapping The Intel MIC coprocessor has N cores, each with 4 hardware thread contexts, for a total of M=4*N threads. The OS maps the hardware threads to M procs. OS proc (M-3) (M-2) (M-1) MIC core (N-1) (N-1) (N-1) (N-1) MIC thread (M-1) (M-4) (M-3) (M-2) The OS runs on proc 0, which lives on MIC core N. Therefore, avoid using procs 0, (M-3), (M-2), and (M-1) to avoid contention with the OS. This is especially important when using the offload model due to data transfer activity.

40 Additional Resources Several sample MIC programs are provided by Intel and can be found in /global/opt/intel/composerxe_mic/ Samples/en_US/C++/mic_samples/ intro_samplec /global/opt/intel/composerxe_mic/ Samples/en_US/Fortran/mic_samples/ intro_samplef Other documentation, presentations, and even a community forum can be found at AACE Wiki is limited to Beacon partners due to past NDA materials

41 Contact Vincent Betro NICS Support

Ryan Hulguin

Ryan Hulguin Ryan Hulguin ryan-hulguin@tennessee.edu Outline Beacon The Beacon project The Beacon cluster TOP500 ranking System specs Xeon Phi Coprocessor Technical specs Many core trend Programming models Applications

More information

Intel Xeon Phi Coprocessor Offloading Computation

Intel Xeon Phi Coprocessor Offloading Computation Intel Xeon Phi Coprocessor Offloading Computation Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE,

More information

Intel Many Integrated Core (MIC) Programming Intel Xeon Phi

Intel Many Integrated Core (MIC) Programming Intel Xeon Phi Intel Many Integrated Core (MIC) Programming Intel Xeon Phi Dmitry Petunin Intel Technical Consultant 1 Legal Disclaimer & INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS. NO LICENSE, EXPRESS OR IMPLIED,

More information

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero Introduction to Intel Xeon Phi programming techniques Fabio Affinito Vittorio Ruggiero Outline High level overview of the Intel Xeon Phi hardware and software stack Intel Xeon Phi programming paradigms:

More information

PRACE PATC Course: Intel MIC Programming Workshop, MKL. Ostrava,

PRACE PATC Course: Intel MIC Programming Workshop, MKL. Ostrava, PRACE PATC Course: Intel MIC Programming Workshop, MKL Ostrava, 7-8.2.2017 1 Agenda A quick overview of Intel MKL Usage of MKL on Xeon Phi Compiler Assisted Offload Automatic Offload Native Execution Hands-on

More information

Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor Intel Xeon Phi Coprocessor A guide to using it on the Cray XC40 Terminology Warning: may also be referred to as MIC or KNC in what follows! What are Intel Xeon Phi Coprocessors? Hardware designed to accelerate

More information

PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ,

PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ, PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ, 27.6-29.6.2016 1 Agenda A quick overview of Intel MKL Usage of MKL on Xeon Phi - Compiler Assisted Offload - Automatic Offload - Native Execution

More information

Offload Computing on Stampede

Offload Computing on Stampede Offload Computing on Stampede Kent Milfeld milfeld@tacc.utexas.edu July, 22 2013 1 MIC Information mic-developer (programming & training tabs): http://software.intel.com/mic-developer Intel Programming

More information

Offloading. Kent Milfeld June,

Offloading. Kent Milfeld June, Kent Milfeld milfeld@tacc.utexas.edu June, 16 2013 MIC Information Stampede User Guide: http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide TACC Advanced Offloading: Search and click

More information

Intel MIC Architecture. Dr. Momme Allalen, LRZ, PRACE PATC: Intel MIC&GPU Programming Workshop

Intel MIC Architecture. Dr. Momme Allalen, LRZ, PRACE PATC: Intel MIC&GPU Programming Workshop Intel MKL @ MIC Architecture Dr. Momme Allalen, LRZ, allalen@lrz.de PRACE PATC: Intel MIC&GPU Programming Workshop 1 2 Momme Allalen, HPC with GPGPUs, Oct. 10, 2011 What is the Intel MKL? Math library

More information

Intel Xeon Phi Coprocessors

Intel Xeon Phi Coprocessors Intel Xeon Phi Coprocessors Reference: Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, by A. Vladimirov and V. Karpusenko, 2013 Ring Bus on Intel Xeon Phi Example with 8 cores Xeon

More information

Offloading. Kent Milfeld Stampede Training, January 11, 2013

Offloading. Kent Milfeld Stampede Training, January 11, 2013 Kent Milfeld milfeld@tacc.utexas.edu Stampede Training, January 11, 2013 1 MIC Information Stampede User Guide: http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide TACC Advanced Offloading:

More information

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Models Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Overview Execution options

More information

Debugging and Optimizing MPI and OpenMP Applications running on CUDA, OpenACC, and Intel Xeon Phi coprocessors with TotalView and ThreadSpotter

Debugging and Optimizing MPI and OpenMP Applications running on CUDA, OpenACC, and Intel Xeon Phi coprocessors with TotalView and ThreadSpotter Debugging and Optimizing MPI and OpenMP Applications running on CUDA, OpenACC, and Intel Xeon Phi coprocessors with TotalView and ThreadSpotter Mike Ashworth STFC/Daresbury Vince Betro UT/NICS/AACE Sandra

More information

An Introduction to the Intel Xeon Phi Coprocessor

An Introduction to the Intel Xeon Phi Coprocessor An Introduction to the Intel Xeon Phi Coprocessor INFIERI-2013 - July 2013 Leo Borges (leonardo.borges@intel.com) Intel Software & Services Group Introduction High-level overview of the Intel Xeon Phi

More information

Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC

Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC # pg. Subject Purpose directory 1 3 5 Offload, Begin (C) (F90) Compile and Run (CPU, MIC, Offload) offload_hello 2 7 Offload, Data Optimize

More information

Offload Computing on Stampede

Offload Computing on Stampede Offload Computing on Stampede Kent Milfeld milfeld@tacc.utexas.edu IEEE Cluster 2013 Sept. 23, 2013 MIC Information mic-developer (programming & training tabs): http://software.intel.com/mic-developer

More information

Get Ready for Intel MKL on Intel Xeon Phi Coprocessors. Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library

Get Ready for Intel MKL on Intel Xeon Phi Coprocessors. Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library Get Ready for Intel MKL on Intel Xeon Phi Coprocessors Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library Legal Disclaimer INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL

More information

Beacon Quickstart Guide at AACE/NICS

Beacon Quickstart Guide at AACE/NICS Beacon Intel MIC Cluster Beacon Overview Beacon Quickstart Guide at AACE/NICS Each compute node has 2 8- core Intel Xeon E5-2670 with 256GB of RAM All compute nodes also contain 4 KNC cards (mic0/1/2/3)

More information

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward. High Performance Parallel Programming Multicore development tools with extensions to many-core. Investment protection. Scale Forward. Enabling & Advancing Parallelism High Performance Parallel Programming

More information

Intel Xeon Phi Coprocessor

Intel Xeon Phi Coprocessor Intel Xeon Phi Coprocessor 1 Agenda Introduction Intel Xeon Phi Architecture Programming Models Outlook Summary 2 Intel Multicore Architecture Intel Many Integrated Core Architecture (Intel MIC) Foundation

More information

Overview: Programming Environment for Intel Xeon Phi Coprocessor

Overview: Programming Environment for Intel Xeon Phi Coprocessor Overview: Programming Environment for Intel Xeon Phi Coprocessor One Source Base, Tuned to many Targets Source Compilers, Libraries, Parallel Models Multicore Many-core Cluster Multicore CPU Multicore

More information

Intel Math Kernel Library Perspectives and Latest Advances. Noah Clemons Lead Technical Consulting Engineer Developer Products Division, Intel

Intel Math Kernel Library Perspectives and Latest Advances. Noah Clemons Lead Technical Consulting Engineer Developer Products Division, Intel Intel Math Kernel Library Perspectives and Latest Advances Noah Clemons Lead Technical Consulting Engineer Developer Products Division, Intel After Compiler and Threading Libraries, what s next? Intel

More information

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation S c i c o m P 2 0 1 3 T u t o r i a l Intel Xeon Phi Product Family Programming Tools Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation Agenda Intel Parallel Studio XE 2013

More information

OpenMP programming Part II. Shaohao Chen High performance Louisiana State University

OpenMP programming Part II. Shaohao Chen High performance Louisiana State University OpenMP programming Part II Shaohao Chen High performance computing @ Louisiana State University Part II Optimization for performance Trouble shooting and debug Common Misunderstandings and Frequent Errors

More information

Introduction to Intel Xeon Phi programming models. F.Affinito F. Salvadore SCAI - CINECA

Introduction to Intel Xeon Phi programming models. F.Affinito F. Salvadore SCAI - CINECA Introduction to Intel Xeon Phi programming models F.Affinito F. Salvadore SCAI - CINECA Part I Introduction to the Intel Xeon Phi architecture Trends: transistors Trends: clock rates Trends: cores and

More information

Introduction of Xeon Phi Programming

Introduction of Xeon Phi Programming Introduction of Xeon Phi Programming Wei Feinstein Shaohao Chen HPC User Services LSU HPC/LONI Louisiana State University 10/21/2015 Introduction to Xeon Phi Programming Overview Intel Xeon Phi architecture

More information

OpenCL C. Matt Sellitto Dana Schaa Northeastern University NUCAR

OpenCL C. Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL C Matt Sellitto Dana Schaa Northeastern University NUCAR OpenCL C Is used to write kernels when working with OpenCL Used to code the part that runs on the device Based on C99 with some extensions

More information

MIC Optimization Example: Matrix

MIC Optimization Example: Matrix MIC Optimization Example: Matrix 9 Multiplication In this chapter, we introduce the programming model and optimization methods by using an example in matrix multiplication. The optimization covers computational

More information

CS691/SC791: Parallel & Distributed Computing

CS691/SC791: Parallel & Distributed Computing CS691/SC791: Parallel & Distributed Computing Introduction to OpenMP 1 Contents Introduction OpenMP Programming Model and Examples OpenMP programming examples Task parallelism. Explicit thread synchronization.

More information

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines

Introduction to OpenMP. OpenMP basics OpenMP directives, clauses, and library routines Introduction to OpenMP Introduction OpenMP basics OpenMP directives, clauses, and library routines What is OpenMP? What does OpenMP stands for? What does OpenMP stands for? Open specifications for Multi

More information

Programming for the Intel Xeon Phi Coprocessor

Programming for the Intel Xeon Phi Coprocessor Programming for the Intel Xeon Phi Coprocessor Dr.-Ing. Michael Klemm Software and Services Group Intel Corporation (michael.klemm@intel.com) 1 Legal Disclaimer & INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Intel Software Development Products for High Performance Computing and Parallel Programming

Intel Software Development Products for High Performance Computing and Parallel Programming Intel Software Development Products for High Performance Computing and Parallel Programming Multicore development tools with extensions to many-core Notices INFORMATION IN THIS DOCUMENT IS PROVIDED IN

More information

Reusing this material

Reusing this material XEON PHI BASICS Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Native Computing and Optimization. Hang Liu December 4 th, 2013

Native Computing and Optimization. Hang Liu December 4 th, 2013 Native Computing and Optimization Hang Liu December 4 th, 2013 Overview Why run native? What is a native application? Building a native application Running a native application Setting affinity and pinning

More information

Parallel Programming with OpenMP. CS240A, T. Yang

Parallel Programming with OpenMP. CS240A, T. Yang Parallel Programming with OpenMP CS240A, T. Yang 1 A Programmer s View of OpenMP What is OpenMP? Open specification for Multi-Processing Standard API for defining multi-threaded shared-memory programs

More information

CS 470 Spring Mike Lam, Professor. OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP CS 470 Spring 2018 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism

More information

Can Accelerators Really Accelerate Harmonie?

Can Accelerators Really Accelerate Harmonie? Can Accelerators Really Accelerate Harmonie? Enda O Brien, Adam Ralph Irish Centre for High-End Computing Motivation There is constant demand for more performance Conventional compute cores not getting

More information

Programming Intel R Xeon Phi TM

Programming Intel R Xeon Phi TM Programming Intel R Xeon Phi TM An Overview Anup Zope Mississippi State University 20 March 2018 Anup Zope (Mississippi State University) Programming Intel R Xeon Phi TM 20 March 2018 1 / 46 Outline 1

More information

Overview of Intel Xeon Phi Coprocessor

Overview of Intel Xeon Phi Coprocessor Overview of Intel Xeon Phi Coprocessor Sept 20, 2013 Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu This talk is only a trailer A comprehensive training on running and optimizing

More information

Intel Math Kernel Library (Intel MKL) Latest Features

Intel Math Kernel Library (Intel MKL) Latest Features Intel Math Kernel Library (Intel MKL) Latest Features Sridevi Allam Technical Consulting Engineer Sridevi.allam@intel.com 1 Agenda - Introduction to Support on Intel Xeon Phi Coprocessors - Performance

More information

CS 470 Spring Mike Lam, Professor. OpenMP

CS 470 Spring Mike Lam, Professor. OpenMP CS 470 Spring 2017 Mike Lam, Professor OpenMP OpenMP Programming language extension Compiler support required "Open Multi-Processing" (open standard; latest version is 4.5) Automatic thread-level parallelism

More information

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number

OpenMP C and C++ Application Program Interface Version 1.0 October Document Number OpenMP C and C++ Application Program Interface Version 1.0 October 1998 Document Number 004 2229 001 Contents Page v Introduction [1] 1 Scope............................. 1 Definition of Terms.........................

More information

CME 213 S PRING Eric Darve

CME 213 S PRING Eric Darve CME 213 S PRING 2017 Eric Darve PTHREADS pthread_create, pthread_exit, pthread_join Mutex: locked/unlocked; used to protect access to shared variables (read/write) Condition variables: used to allow threads

More information

Overview: The OpenMP Programming Model

Overview: The OpenMP Programming Model Overview: The OpenMP Programming Model motivation and overview the parallel directive: clauses, equivalent pthread code, examples the for directive and scheduling of loop iterations Pi example in OpenMP

More information

Beyond Offloading Programming Models for the Intel Xeon Phi Coprocessor. Michael Hebenstreit, Senior Cluster Architect, Intel SFTS001

Beyond Offloading Programming Models for the Intel Xeon Phi Coprocessor. Michael Hebenstreit, Senior Cluster Architect, Intel SFTS001 Beyond Offloading Programming Models for the Intel Xeon Phi Coprocessor Michael Hebenstreit, Senior Cluster Architect, Intel SFTS001 Agenda Overview Automatic offloading Offloading by pragmas and keywords

More information

OpenMP 4.0/4.5. Mark Bull, EPCC

OpenMP 4.0/4.5. Mark Bull, EPCC OpenMP 4.0/4.5 Mark Bull, EPCC OpenMP 4.0/4.5 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all

More information

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder

OpenMP programming. Thomas Hauser Director Research Computing Research CU-Boulder OpenMP programming Thomas Hauser Director Research Computing thomas.hauser@colorado.edu CU meetup 1 Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

Parallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel

Parallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel Parallel Programming The Ultimate Road to Performance April 16, 2013 Werner Krotz-Vogel 1 Getting started with parallel algorithms Concurrency is a general concept multiple activities that can occur and

More information

OpenMP 4.0. Mark Bull, EPCC

OpenMP 4.0. Mark Bull, EPCC OpenMP 4.0 Mark Bull, EPCC OpenMP 4.0 Version 4.0 was released in July 2013 Now available in most production version compilers support for device offloading not in all compilers, and not for all devices!

More information

Getting Started with TotalView on Beacon

Getting Started with TotalView on Beacon Getting Started with TotalView on Beacon 1) ssh X uname@beacon.nics.utk.edu (sets x forwarding so you can use the GUI) 2) qsub X I A UT- AACE- BEACON (sets x forwarding in the interactive job) 3) compile

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

Scientific Programming in C XIV. Parallel programming

Scientific Programming in C XIV. Parallel programming Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence

More information

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen

OpenMP - II. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS15/16. HPAC, RWTH Aachen OpenMP - II Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS15/16 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT

More information

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system

OpenMP. A parallel language standard that support both data and functional Parallelism on a shared memory system OpenMP A parallel language standard that support both data and functional Parallelism on a shared memory system Use by system programmers more than application programmers Considered a low level primitives

More information

Native Computing and Optimization on Intel Xeon Phi

Native Computing and Optimization on Intel Xeon Phi Native Computing and Optimization on Intel Xeon Phi ISC 2015 Carlos Rosales carlos@tacc.utexas.edu Overview Why run native? What is a native application? Building a native application Running a native

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming 1 Outline n OpenMP n Shared-memory model n Parallel for loops n Declaring private variables n Critical

More information

M/s. Managing distributed workloads. Language Reference Manual. Miranda Li (mjl2206) Benjamin Hanser (bwh2124) Mengdi Lin (ml3567)

M/s. Managing distributed workloads. Language Reference Manual. Miranda Li (mjl2206) Benjamin Hanser (bwh2124) Mengdi Lin (ml3567) 1 M/s Managing distributed workloads Language Reference Manual Miranda Li (mjl2206) Benjamin Hanser (bwh2124) Mengdi Lin (ml3567) Table of Contents 1. Introduction 2. Lexical elements 2.1 Comments 2.2

More information

C Syntax Out: 15 September, 1995

C Syntax Out: 15 September, 1995 Burt Rosenberg Math 220/317: Programming II/Data Structures 1 C Syntax Out: 15 September, 1995 Constants. Integer such as 1, 0, 14, 0x0A. Characters such as A, B, \0. Strings such as "Hello World!\n",

More information

15-418, Spring 2008 OpenMP: A Short Introduction

15-418, Spring 2008 OpenMP: A Short Introduction 15-418, Spring 2008 OpenMP: A Short Introduction This is a short introduction to OpenMP, an API (Application Program Interface) that supports multithreaded, shared address space (aka shared memory) parallelism.

More information

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means

OpenMP Programming. Prof. Thomas Sterling. High Performance Computing: Concepts, Methods & Means High Performance Computing: Concepts, Methods & Means OpenMP Programming Prof. Thomas Sterling Department of Computer Science Louisiana State University February 8 th, 2007 Topics Introduction Overview

More information

OpenMP Technical Report 1 on Directives for Attached Accelerators

OpenMP Technical Report 1 on Directives for Attached Accelerators This document has been superseded by the ratification of OpenMP 4.0. OpenMP Technical Report 1 on Directives for Attached Accelerators This Technical Report specifies proposed directives for the execution

More information

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP

COMP4510 Introduction to Parallel Computation. Shared Memory and OpenMP. Outline (cont d) Shared Memory and OpenMP COMP4510 Introduction to Parallel Computation Shared Memory and OpenMP Thanks to Jon Aronsson (UofM HPC consultant) for some of the material in these notes. Outline (cont d) Shared Memory and OpenMP Including

More information

Introduction to the Xeon Phi programming model. Fabio AFFINITO, CINECA

Introduction to the Xeon Phi programming model. Fabio AFFINITO, CINECA Introduction to the Xeon Phi programming model Fabio AFFINITO, CINECA What is a Xeon Phi? MIC = Many Integrated Core architecture by Intel Other names: KNF, KNC, Xeon Phi... Not a CPU (but somewhat similar

More information

Shared Memory Programming with OpenMP

Shared Memory Programming with OpenMP Shared Memory Programming with OpenMP (An UHeM Training) Süha Tuna Informatics Institute, Istanbul Technical University February 12th, 2016 2 Outline - I Shared Memory Systems Threaded Programming Model

More information

Allows program to be incrementally parallelized

Allows program to be incrementally parallelized Basic OpenMP What is OpenMP An open standard for shared memory programming in C/C+ + and Fortran supported by Intel, Gnu, Microsoft, Apple, IBM, HP and others Compiler directives and library support OpenMP

More information

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor.

3.Constructors and Destructors. Develop cpp program to implement constructor and destructor. 3.Constructors and Destructors Develop cpp program to implement constructor and destructor. Constructors A constructor is a special member function whose task is to initialize the objects of its class.

More information

Hands-on with Intel Xeon Phi

Hands-on with Intel Xeon Phi Hands-on with Intel Xeon Phi Lab 2: Native Computing and Vector Reports Bill Barth Kent Milfeld Dan Stanzione 1 Lab 2 What you will learn about Evaluating and Analyzing vector performance. Controlling

More information

Introduction to OpenMP. Lecture 2: OpenMP fundamentals

Introduction to OpenMP. Lecture 2: OpenMP fundamentals Introduction to OpenMP Lecture 2: OpenMP fundamentals Overview 2 Basic Concepts in OpenMP History of OpenMP Compiling and running OpenMP programs What is OpenMP? 3 OpenMP is an API designed for programming

More information

A Fast Review of C Essentials Part I

A Fast Review of C Essentials Part I A Fast Review of C Essentials Part I Structural Programming by Z. Cihan TAYSI Outline Program development C Essentials Functions Variables & constants Names Formatting Comments Preprocessor Data types

More information

Summer 2003 Lecture 14 07/02/03

Summer 2003 Lecture 14 07/02/03 Summer 2003 Lecture 14 07/02/03 LAB 6 Lab 6 involves interfacing to the IBM PC parallel port Use the material on wwwbeyondlogicorg for reference This lab requires the use of a Digilab board Everyone should

More information

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops

Parallel Programming. Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Parallel Programming Exploring local computational resources OpenMP Parallel programming for multiprocessors for loops Single computers nowadays Several CPUs (cores) 4 to 8 cores on a single chip Hyper-threading

More information

[Potentially] Your first parallel application

[Potentially] Your first parallel application [Potentially] Your first parallel application Compute the smallest element in an array as fast as possible small = array[0]; for( i = 0; i < N; i++) if( array[i] < small ) ) small = array[i] 64-bit Intel

More information

What s New August 2015

What s New August 2015 What s New August 2015 Significant New Features New Directory Structure OpenMP* 4.1 Extensions C11 Standard Support More C++14 Standard Support Fortran 2008 Submodules and IMPURE ELEMENTAL Further C Interoperability

More information

Lab MIC Experiments 4/25/13 TACC

Lab MIC Experiments 4/25/13 TACC Lab MIC Experiments 4/25/13 TACC # pg. Subject Purpose directory 1 3 5 Offload, Begin (C) (F90) Compile and Run (CPU, MIC, Offload) offload_hello 2 7 Offload, Data Optimize Offload Data Transfers offload_transfer

More information

Parallel Programming

Parallel Programming Parallel Programming OpenMP Nils Moschüring PhD Student (LMU) Nils Moschüring PhD Student (LMU), OpenMP 1 1 Overview What is parallel software development Why do we need parallel computation? Problems

More information

Intel Xeon Phi Workshop. Bart Oldeman, McGill HPC May 7, 2015

Intel Xeon Phi Workshop. Bart Oldeman, McGill HPC  May 7, 2015 Intel Xeon Phi Workshop Bart Oldeman, McGill HPC bart.oldeman@mcgill.ca guillimin@calculquebec.ca May 7, 2015 1 Online Slides http://tinyurl.com/xeon-phi-handout-may2015 OR http://www.hpc.mcgill.ca/downloads/xeon_phi_workshop_may2015/handout.pdf

More information

An Introduction to OpenAcc

An Introduction to OpenAcc An Introduction to OpenAcc ECS 158 Final Project Robert Gonzales Matthew Martin Nile Mittow Ryan Rasmuss Spring 2016 1 Introduction: What is OpenAcc? OpenAcc stands for Open Accelerators. Developed by

More information

An Introduction to OpenACC. Zoran Dabic, Rusell Lutrell, Edik Simonian, Ronil Singh, Shrey Tandel

An Introduction to OpenACC. Zoran Dabic, Rusell Lutrell, Edik Simonian, Ronil Singh, Shrey Tandel An Introduction to OpenACC Zoran Dabic, Rusell Lutrell, Edik Simonian, Ronil Singh, Shrey Tandel Chapter 1 Introduction OpenACC is a software accelerator that uses the host and the device. It uses compiler

More information

Introduction to. Slides prepared by : Farzana Rahman 1

Introduction to. Slides prepared by : Farzana Rahman 1 Introduction to OpenMP Slides prepared by : Farzana Rahman 1 Definition of OpenMP Application Program Interface (API) for Shared Memory Parallel Programming Directive based approach with library support

More information

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen

OpenMP I. Diego Fabregat-Traver and Prof. Paolo Bientinesi WS16/17. HPAC, RWTH Aachen OpenMP I Diego Fabregat-Traver and Prof. Paolo Bientinesi HPAC, RWTH Aachen fabregat@aices.rwth-aachen.de WS16/17 OpenMP References Using OpenMP: Portable Shared Memory Parallel Programming. The MIT Press,

More information

Úvod do Intel Xeon Phi. Martin Stachoň Lubomír Říha IT4Innovations

Úvod do Intel Xeon Phi. Martin Stachoň Lubomír Říha IT4Innovations Úvod do Intel Xeon Phi Martin Stachoň Lubomír Říha IT4Innovations Outline Accelerators philosophy Programming models Practical info for Salomon Historical Analysis Performance Vector Machines PetaFLOPS

More information

OpenMP 4.0 implementation in GCC. Jakub Jelínek Consulting Engineer, Platform Tools Engineering, Red Hat

OpenMP 4.0 implementation in GCC. Jakub Jelínek Consulting Engineer, Platform Tools Engineering, Red Hat OpenMP 4.0 implementation in GCC Jakub Jelínek Consulting Engineer, Platform Tools Engineering, Red Hat OpenMP 4.0 implementation in GCC Work started in April 2013, C/C++ support with host fallback only

More information

OpenMP on Ranger and Stampede (with Labs)

OpenMP on Ranger and Stampede (with Labs) OpenMP on Ranger and Stampede (with Labs) Steve Lantz Senior Research Associate Cornell CAC Parallel Computing at TACC: Ranger to Stampede Transition November 6, 2012 Based on materials developed by Kent

More information

Parallel Programming in C with MPI and OpenMP

Parallel Programming in C with MPI and OpenMP Parallel Programming in C with MPI and OpenMP Michael J. Quinn Chapter 17 Shared-memory Programming Outline OpenMP Shared-memory model Parallel for loops Declaring private variables Critical sections Reductions

More information

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato

OpenMP. Application Program Interface. CINECA, 14 May 2012 OpenMP Marco Comparato OpenMP Application Program Interface Introduction Shared-memory parallelism in C, C++ and Fortran compiler directives library routines environment variables Directives single program multiple data (SPMD)

More information

The following program computes a Calculus value, the "trapezoidal approximation of

The following program computes a Calculus value, the trapezoidal approximation of Multicore machines and shared memory Multicore CPUs have more than one core processor that can execute instructions at the same time. The cores share main memory. In the next few activities, we will learn

More information

COMP Parallel Computing. SMM (2) OpenMP Programming Model

COMP Parallel Computing. SMM (2) OpenMP Programming Model COMP 633 - Parallel Computing Lecture 7 September 12, 2017 SMM (2) OpenMP Programming Model Reading for next time look through sections 7-9 of the Open MP tutorial Topics OpenMP shared-memory parallel

More information

Intel MPI Library Conditional Reproducibility

Intel MPI Library Conditional Reproducibility 1 Intel MPI Library Conditional Reproducibility By Michael Steyer, Technical Consulting Engineer, Software and Services Group, Developer Products Division, Intel Corporation Introduction High performance

More information

Laurent Duhem Intel Alain Dominguez - Intel

Laurent Duhem Intel Alain Dominguez - Intel Laurent Duhem Intel Alain Dominguez - Intel Agenda 2 What are Intel Xeon Phi Coprocessors? Architecture and Platform overview Intel associated software development tools Execution and Programming model

More information

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012

CS4961 Parallel Programming. Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms 9/5/12. Administrative. Mary Hall September 4, 2012 CS4961 Parallel Programming Lecture 5: More OpenMP, Introduction to Data Parallel Algorithms Administrative Mailing list set up, everyone should be on it - You should have received a test mail last night

More information

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec PROGRAMOVÁNÍ V C++ CVIČENÍ Michal Brabec PARALLELISM CATEGORIES CPU? SSE Multiprocessor SIMT - GPU 2 / 17 PARALLELISM V C++ Weak support in the language itself, powerful libraries Many different parallelization

More information

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters

An Extension of XcalableMP PGAS Lanaguage for Multi-node GPU Clusters An Extension of XcalableMP PGAS Lanaguage for Multi-node Clusters Jinpil Lee, Minh Tuan Tran, Tetsuya Odajima, Taisuke Boku and Mitsuhisa Sato University of Tsukuba 1 Presentation Overview l Introduction

More information

Path to Exascale? Intel in Research and HPC 2012

Path to Exascale? Intel in Research and HPC 2012 Path to Exascale? Intel in Research and HPC 2012 Intel s Investment in Manufacturing New Capacity for 14nm and Beyond D1X Oregon Development Fab Fab 42 Arizona High Volume Fab 22nm Fab Upgrades D1D Oregon

More information

Short Notes of CS201

Short Notes of CS201 #includes: Short Notes of CS201 The #include directive instructs the preprocessor to read and include a file into a source code file. The file name is typically enclosed with < and > if the file is a system

More information

Programming with OpenMP*

Programming with OpenMP* Objectives At the completion of this module you will be able to Thread serial code with basic OpenMP pragmas Use OpenMP synchronization pragmas to coordinate thread execution and memory access 2 Agenda

More information

COMP-520 GoLite Tutorial

COMP-520 GoLite Tutorial COMP-520 GoLite Tutorial Alexander Krolik Sable Lab McGill University Winter 2019 Plan Target languages Language constructs, emphasis on special cases General execution semantics Declarations Types Statements

More information

OpenMP threading: parallel regions. Paolo Burgio

OpenMP threading: parallel regions. Paolo Burgio OpenMP threading: parallel regions Paolo Burgio paolo.burgio@unimore.it Outline Expressing parallelism Understanding parallel threads Memory Data management Data clauses Synchronization Barriers, locks,

More information

Review of the C Programming Language

Review of the C Programming Language Review of the C Programming Language Prof. James L. Frankel Harvard University Version of 11:55 AM 22-Apr-2018 Copyright 2018, 2016, 2015 James L. Frankel. All rights reserved. Reference Manual for the

More information

Declaration Syntax. Declarations. Declarators. Declaration Specifiers. Declaration Examples. Declaration Examples. Declarators include:

Declaration Syntax. Declarations. Declarators. Declaration Specifiers. Declaration Examples. Declaration Examples. Declarators include: Declarations Based on slides from K. N. King Declaration Syntax General form of a declaration: declaration-specifiers declarators ; Declaration specifiers describe the properties of the variables or functions

More information