Overview of Intel Xeon Phi Coprocessor

Similar documents
Introduction to Xeon Phi. Bill Barth January 11, 2013

Reusing this material

Debugging Intel Xeon Phi KNC Tutorial

Introduc)on to Xeon Phi

Native Computing and Optimization. Hang Liu December 4 th, 2013

Intel C++ Compiler User's Guide With Support For The Streaming Simd Extensions 2

Accelerator Programming Lecture 1

Klaus-Dieter Oertel, May 28 th 2013 Software and Services Group Intel Corporation

the Intel Xeon Phi coprocessor

Introduc)on to Xeon Phi

MIC Lab Parallel Computing on Stampede

Installation Guide and Release Notes

Intel Parallel Studio XE 2015 Composer Edition for Linux* Installation Guide and Release Notes

Introduction to the Intel Xeon Phi on Stampede

SCALABLE HYBRID PROTOTYPE

Native Computing and Optimization on Intel Xeon Phi

Architecture, Programming and Performance of MIC Phi Coprocessor

Intel Xeon Phi Coprocessors

Introduction to Intel Xeon Phi programming techniques. Fabio Affinito Vittorio Ruggiero

Intel Xeon Phi Coprocessor

Intel MPI Cluster Edition on Graham A First Look! Doug Roberts

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PRACE PATC Course: Intel MIC Programming Workshop, MKL. Ostrava,

Using Intel Math Kernel Library with MathWorks* MATLAB* on Intel Xeon Phi Coprocessor System

An Introduction to the Intel Xeon Phi. Si Liu Feb 6, 2015

Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ,

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Intel Math Kernel Library (Intel MKL) Latest Features

Intel Xeon Phi Coprocessor

Investigation of Intel MIC for implementation of Fast Fourier Transform

Scalasca support for Intel Xeon Phi. Brian Wylie & Wolfgang Frings Jülich Supercomputing Centre Forschungszentrum Jülich, Germany

Intel MIC Programming Workshop, Hardware Overview & Native Execution. IT4Innovations, Ostrava,

Introduc)on to Xeon Phi

Maximizing performance and scalability using Intel performance libraries

Symmetric Computing. SC 14 Jerome VIENNE

PRACE PATC Course: Intel MIC Programming Workshop, MKL LRZ,

Using Intel VTune Amplifier XE for High Performance Computing

What s New August 2015

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Lab MIC Offload Experiments 7/22/13 MIC Advanced Experiments TACC

Intel MPI Library Conditional Reproducibility

Symmetric Computing. ISC 2015 July John Cazes Texas Advanced Computing Center

Message Passing Interface (MPI) on Intel Xeon Phi coprocessor

A Unified Approach to Heterogeneous Architectures Using the Uintah Framework

Intel MIC Architecture. Dr. Momme Allalen, LRZ, PRACE PATC: Intel MIC&GPU Programming Workshop

Symmetric Computing. Jerome Vienne Texas Advanced Computing Center

Symmetric Computing. John Cazes Texas Advanced Computing Center

Get Ready for Intel MKL on Intel Xeon Phi Coprocessors. Zhang Zhang Technical Consulting Engineer Intel Math Kernel Library

Beyond Offloading Programming Models for the Intel Xeon Phi Coprocessor. Michael Hebenstreit, Senior Cluster Architect, Intel SFTS001

Native Computing and Optimization on the Intel Xeon Phi Coprocessor. Lars Koesterke John D. McCalpin

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Hands-on with Intel Xeon Phi

PROGRAMOVÁNÍ V C++ CVIČENÍ. Michal Brabec

Intel Software Development Products for High Performance Computing and Parallel Programming

Native Computing and Optimization on the Intel Xeon Phi Coprocessor. John D. McCalpin

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Native Computing and Optimization on the Intel Xeon Phi Coprocessor. John D. McCalpin

Parallel Programming on Ranger and Stampede


Scientific Computing with Intel Xeon Phi Coprocessors

Intel Xeon Phi Coprocessor

PRACE PATC Course: Intel MIC Programming Workshop MPI LRZ,

Intel Parallel Studio XE 2015

Xeon Phi Coprocessors on Turing

Preliminary Experiences with the Uintah Framework on on Intel Xeon Phi and Stampede

Migrating Offloading Software to Intel Xeon Phi Processor

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title

Advanced Threading and Optimization

Parallel Applications on Distributed Memory Systems. Le Yan HPC User LSU

Intel Many Integrated Core (MIC) Architecture

Bring your application to a new era:

HPC. Accelerating. HPC Advisory Council Lugano, CH March 15 th, Herbert Cornelius Intel

Our new HPC-Cluster An overview

Hands-on. MPI basic exercises

Parallel Programming. The Ultimate Road to Performance April 16, Werner Krotz-Vogel

Intel Performance Libraries

Programming Intel R Xeon Phi TM

The Era of Heterogeneous Computing

Preparing for Highly Parallel, Heterogeneous Coprocessing

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing

This document provides an overview of the COPRTHR-2 Software. available in the following supporting documentation:

Offloading. Kent Milfeld Stampede Training, January 11, 2013

Open Fabrics Workshop 2013

Intel Xeon Phi Coprocessor Offloading Computation

Eliminate Memory Errors to Improve Program Stability

Eliminate Threading Errors to Improve Program Stability

Parallel Programming Pre-Assignment. Setting up the Software Environment

Code modernization and optimization for improved performance using the OpenMP* programming model for threading and SIMD parallelism.

Technical Report. Document Id.: CESGA Date: July 28 th, Responsible: Andrés Gómez. Status: FINAL

Does the Intel Xeon Phi processor fit HEP workloads?

Xeon Phi Native Mode - Sharpen Exercise

Intel MIC Programming Workshop, Hardware Overview & Native Execution. Ostrava,

Intel Manycore Testing Lab (MTL) - Linux Getting Started Guide

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

OpenMP on Ranger and Stampede (with Labs)

High Performance Parallel Programming. Multicore development tools with extensions to many-core. Investment protection. Scale Forward.

Expressing and Analyzing Dependencies in your C++ Application

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

Intel Knights Landing Hardware

Getting Started with Intel SDK for OpenCL Applications

Transcription:

Overview of Intel Xeon Phi Coprocessor Sept 20, 2013 Ritu Arora Texas Advanced Computing Center Email: rauta@tacc.utexas.edu

This talk is only a trailer A comprehensive training on running and optimizing your code for Intel Xeon Phi will be held on October 4, 2013 Title of the Training: Optimize Your Code for the Intel XEON Phi 2

Intel Xeon Phi Coprocessor What is it? Also known as Intel Many Integrated Core (MIC) architecture, leverages the x86 architecture and is about large die with simple circuit. Is a highly parallel device - provides up to 61 cores, 244 threads, and 1.2 teraflops of performance Comes in a variety of configurations to address diverse hardware, software, workload, performance, and efficiency requirements. The majority of the 6400 nodes on Stampede are configured with two Xeon E5-2680 processors and one Intel Xeon Phi SE10P Coprocessor (on a PCIe card) 3

An Intel Xeon Phi Architecture Core Each core includes a newlydesigned Vector Processing Unit (VPU) The VPU is a key feature of the Intel Xeon Phi Architecturebased cores Each vector unit contains 32 512-bit vector registers Fully utilizing the vector unit is critical for best performance Source: Reference 1 4

Software Stack Source: Reference 1 5

Developing Applications for Intel Xeon Phi Because it is based on the x86 technology, applications for Intel Xeon Phi can be developed using existing knowledge of multi-core and SIMD programming Programming and optimizing for Intel Xeon Phi coprocessors is similar to programming and optimizing for Intel Xeon processors Porting applications to Intel Xeon Phi You can port sections of your code (written in C/C++ or FORTRAN) to run on Intel Xeon Phi using the offload language extensions You can also port your entire application to Intel Xeon Phi Highly parallel applications that also use SIMD operations for most of their execution get best performance results on Intel Xeon Phi 6

Development Environment: Available Compilers and Libraries Compilers for building applications that run on Intel 64 architecture and Intel Xeon Phi Architecture Intel C++ Composer XE 2013 Intel Fortran Composer XE 2013 Libraries packaged with the compilers include: Intel Math Kernel Library (Intel MKL) optimized for the Intel Xeon Phi Intel Threading Building Blocks (Intel TBB) Intel Integrated Performance Primitive (Intel IPP) Libraries packaged separately include: Intel MPI for Linux OS 7

Intel Xeon Phi Execution Models 1. Offload Execution Mode 2. Coprocessor Native Execution Mode 3. Symmetric Execution Mode 8

Intel Xeon Phi Execution Models: Offload Mode The host system (Xeon E5 processor on Stampede) offloads part or all of the computation to the coprocessor The application starts execution on the host As the computation proceeds, the host can decide to send data to the coprocessor and let that work on it The host and the coprocessor may or may not work in parallel Note: Offload code will not run on the 61 st core of Phi as the offload daemon runs here 9

Intel Xeon Phi Execution Models: Coprocessor Native Execution Mode The coprocessor can be viewed as another compute node and users run their applications directly on it without offload from a host system In order to run natively, an application has to be cross-compiled for Xeon Phi operating environment Intel Composer XE provides simple switch ( mmic) to generate cross-compiled code Execute the compiled code on host or ssh to mic0 to run on Phi 10

Intel Xeon Phi Execution Models: Symmetric Execution Mode In this case the application processes run on both the host and the Intel Xeon Phi coprocessor They usually communicate through a message passing interface This execution environment treats Xeon Phi card as another node in a cluster in a heterogeneous cluster environment Note: Load-balancing between Xeon cores and Xeon Phi cores needed 11

Intel MKL Automatic Offload Model Few of the host Intel MKL functions are Automatic Offload-aware Call such functions as you normally would on the host If you have preceded the library call with a call to mkl_mic_enable(), Intel MKL will automatically decide at runtime whether some or all of the work required to complete the call should be divided between the host and the Intel Xeon Phi Coprocessor depending upon problem size, the load on both processors, and other metrics Turn this functionality off with mkl_mic_disable() 12

Is Intel Xeon Phi architecture for me? Source: Reference 2 13

Sample Code (1) Sample offload code using the explicit memory copy model can be found on Stampede at the paths below: C++ /opt/apps/intel/13/composer_xe_2013.0.062/samples/en_us/ C++/mic_samples Fortran /opt/apps/intel/13/composer_xe_2013.0.062/samples/en_us/ Fortran/mic_samples/ 14

Sample Code (2) Sample offload code using MKL in automatic offload mode: /opt/apps/intel/13/composer_xe_2013.0.062/mkl/examples/ mic_samples/ao_sgemm /opt/apps/intel/13/composer_xe_2013.0.062/mkl/examples/ mic_samples/ao_sgemm_f 15

Native Execution Mode Example (1) A simple hello world program in hello.c //Content of hello.c int main() { } printf("hello world from Intel Xeon Phi\n"); return 0; Steps for compiling and running the code on Intel Xeon Phi login1$ icc -mmic -o hello hello.c login1$ srun --pty -A A-ccsc -p normal-mic -t 1:00:00 -n 16 /bin/bash -l c455-002$ scp hello mic0: c455-002$ ssh mic0 ~ $./hello Hello world from Intel Xeon Phi 16

Native Execution Mode Example (2) Once the code has been compiled on the host for execution on MIC, user can launch the executable on the MIC directly from the host, either implicitly or explicitly no need to ssh to mic0 (like it was shown on previous slide) This is accomplished with the help of native application launcher on Stampede known as micrun c455-002$ micrun /work/01698/rauta/training/mic_training/hello Hello world from Intel Xeon Phi The application will run on the coprocessor from the directory where it was launched on the host, will propagate the application's return value to the shell, and will automatically propagate and translate any environment variables that contain the MIC_ prefix. 17

References 1. http://software.intel.com/sites/default/files/article/335818/i ntel-xeon-phi-coprocessor-quick-start-developers-guide.pdf 2. http://software.intel.com/en-us/articles/is-the-intel-xeonphi-coprocessor-right-for-me 3. http://software.intel.com/en-us/articles/intel-xeon-phiprogramming-environment 4. https://www.cac.cornell.edu/vw/mic/native.aspx 18