HPC Architectures evolution: the case of Marconi, the new CINECA flagship system. Piero Lanucara

Size: px
Start display at page:

Download "HPC Architectures evolution: the case of Marconi, the new CINECA flagship system. Piero Lanucara"

Transcription

1 HPC Architectures evolution: the case of Marconi, the new CINECA flagship system Piero Lanucara

2 Many advantages as a supercomputing resource: Low energy consumption. Limited floor space requirements Fast internal network Homogeneous architecture simple usage model. But Low, single core performance + I/O structure meant very high parallelism necessary (at least 1024 cores). For some applications low memory/core (1Gb) and I/O performance also a problem. Also limited capabilities of O.S. on compute cores (e.g. no interactive access) Cross compilation, because login nodes different to compute nodes, can complicate some build procedures. BG/Q (Fermi) as a Tier0 Resource FERMI scheduled to be decommissioned mid-end 2016

3 Replacing Fermi at Cineca - considerations A new procurement is a complicated process and considers many factors but must include (together with the price): Minimum peak compute power Power consumption Floor space required Availability Disk space, internal network, etc. IBM no longer offers the BlueGene range for supercomputers so cannot be a solution. Many computer centres are adopting instead a heterogenous model for computer clusters

4 Replacing Fermi the Marconi solution Fermi 2PFlops. 0.8Mwatt Galileo & PICO 1.2PFlops. 0.4Mwatt Marconi A1 2.1PFlops. 0.7Mwatt Marconi A2 11PFlops. 1.3MWatt Alpha Volume Beta Commit Wins Marconi A3 7PFlops. 1.0Mwatt Sistem A4(?) 50PFlops. 3.2 Mwatt 1.2Mwatt 2.4Mwatt 2.3Mwatt 2.3Mwatt 3.2Mwatt 50 rack 120 rack 120 rack 120 rack 150 rack 100mq 240mq 240mq 240mq 300mq

5 Marconi High level system Characteristics Tender proposal Partition Installation CPU # nodes # of Racks Power A1 Broadwell (2.1 PFlops) April 2016 E v KW A2 - Knight Landing (11 Pflops) September 2016 KNL KW A3 Skylake (7 Pflops expected) June 2017 E v5 >1500 > KW Network: Intel OmniPath

6 1 PFs no-conventional (KNL) A2 KNL 68cores, 1.4 GHz; 3600 nodes, 11 PFs 5 PFs - conventional 1 PFs - conventional A1 BRD 2x18 cores, 2.3GHz 1500 nodes, 2PFs A3 SKL 2x20 cores, 2.3 GHz; >1500 nodes, 7 PFs

7 storage GSS*: 5 PB scratch area (50 GB/s) by Jan17: 10 PB long term storage (40 GB/s) 20 PB Tape library 1 PFs no-conventional (KNL) A2 KNL 68cores, 1.4 GHz; 3600 nodes, 11 PFs 5 PFs - conventional 1 PFs - conventional A1 BRD 2x18 cores, 2.3GHz 1500 nodes, 2PFs A3 SKL 2x20 cores, 2.3 GHz; >1500 nodes, 7 PFs

8 Marconi - Compute A1 (half reserved to EUROfusion) 1512 Lenovo NeXtScale Server -> 2 PFlops Intel E v4 Broadwell GHz. dual socket node: 36 core and 128GByte / node A2 (1 PFlops to EUROfusion) 3600 server Intel AdamPass -> 11 PFlops Intel PHI code name Knight Landing (KNL) GHz. single socket node: 96GByte DDR4 + 16GByte MCDRAM A3 (great part reserved to EUROfusion) >1500 Lenovo Stark Server -> 7 PFlops Intel E5-2680v5 SkyLake GHz dual socket node: 40 core and 196GByte /node

9 Marconi - Compute A1 (half reserved to EUROfusion) 1512 Lenovo NeXtScale Server -> 2 PFlops Intel E v4 Broadwell GHz. dual socket node: 36 core and 128GByte / node A2 (1 PFlops to EUROfusion) 3600 server Intel AdamPass -> 11 PFlops Intel PHI code name Knight Landing (KNL) GHz. single socket node: 96GByte DDR4 + 16GByte MCDRAM A3 (great part reserved to EUROfusion) >1500 Lenovo Stark Server -> 7 PFlops Intel E5-2680v5 SkyLake GHz dual socket node: 40 core and 196GByte /node

10 Marconi - Compute A1 (half reserved to EUROfusion) 1512 Lenovo NeXtScale Server -> 2 PFlops Intel E v4 Broadwell GHz. dual socket node: 36 core and 128GByte / node A2 (1 PFlops to EUROfusion) 3600 server Intel AdamPass -> 11 PFlops Intel PHI code name Knight Landing (KNL) GHz. single socket node: 96GByte DDR4 + 16GByte MCDRAM A3 (great part reserved to EUROfusion) >1500 Lenovo Stark Server -> 7 PFlops Intel E5-2680v5 SkyLake GHz dual socket node: 40 core and 196GByte /node

11 Marconi - Network Network type: new Intel Omnipath Largest Omnipath cluster of the world Network topology: Fat-tree 2:1 oversubscription tapering at the level of the core switches only Core Switches: 5 x OPA Core Switch Sawtooth Forest 768 ports each Hdge Switch: 216 OPA Edge Switch Eldorado Forest 48 ports each Maximum system configuration: 5(opa) x 768(ports) x 2(tapering) -> 7680 servers

12 5 x 768 ports core Switches 3x 216 x 48 ports Hedge Switches 32 downlink 32 nodes fully interconnected island 6624 Compute nodes

13 A1 HPL Full system Linpack: 1 MPI task per node perf range: PFs. Max Perf: PFs with Turbo-OFF. Turbo-ON -> throttling June 2016:Number 46

14 A2 HPL Full system Linpack: 3556 nodes 1 MPI task per node Max Perf with HyperThreading-OFF. November 2016:Number 12 [0] ================================================================================ [0] T/V N NB P Q Time Gflops [0] [0] W[0] R00L2L e+06 [0] HPL_pdgesv() start time Fri Nov 4 23:10: [0] [0] HPL_pdgesv() end time Sat Nov 5 06:33: [0] [0] HPL Efficiency by CPU Cycle % [0] HPL Efficiency by BUS Cycle % [0] [0] Ax-b _oo/(eps*( A _oo* x _oo+ b _oo)*n)= PASSED [0] ================================================================================ [0] [0] Finished 1 tests with the following results: [0] 1 tests completed and passed residual checks, [0] 0 tests completed and failed residual checks, [0] 0 tests skipped because of illegal input values. [0] [0] [0] End of Tests. [0] ================================================================================

15 Intel Xeon PHI KNC An accelerator (like GPUs) but more similar to a conventional multicore CPU. Current version, Knight s Corner (KNC) has GHz cores,8-16gb RAM. 512 bit vector unit. Cores connected in a ring topology and MPI possible. No need to write CUDA or OpenCL as Intel compilers will compile Fortran or C code for the MIC. 1-2 Tflops, according to model.

16 A2: Knights Landing (KNL) A big unknown because very few people currently have access to KNL. But we know the architecture of KNL and the differences and similarities with respect to KNC. The main differences are: KNL will be a standalone processor not an accelerator (unlike KNC) KNL has more powerful cores and faster internal network. On package high performance, memory (16GB, MCDRAM).

17 Intel Xeon PHI KNC-KNL comparision KNC (Galileo) KNL (Marconi) #cores 61 (pentium) 68 (Atom ) Core frequency GHz 1.4 Ghz Memory 16GB GDDR5 96GB DDR4 +16Gb MCDRAM Internal network Bi-directional Ring Mesh Vectorisation 512 bit /core 2xAVX-512 /core Usage Co-processor Standalone Performance (Gflops) 1208 (dp)/2416 (sp) ~3000 (dp) Power ~300W ~200W A KNC core can be 10x slower than a Haswell core. A KNL core is expected to be 2-3X slower. Big differences also in memory bandwidth.

18

19

20 Coming next: A3 A3. Intel Skylake processors (mid-2017) Successor to Haswell, and launched in Expect increase in performance and power efficiency.

21 Coming next: A3

22 Marconi A1 and A2 exploitation at its best *

23 Exploiting the parallel universe Three levels of parallelism supported by Intel hardware Thread Level Parallelism Multi thread/task performance Exposed by programming models Execute tens/hundreds/thousands task concurrently Vector Level Parallelism Single thread performance Exposed by tools and programming models Operate on 4/8/16 elements at a time Instruction Level Parallelism Single thread performance Automatically exposed by HW/tools Effectively limited to a few instructions

24 A1 exploitation A1: Broadwell nodes Similar to Haswell cores present on Galileo. Expect only a small difference in single core performance wrt Galileo, but a big difference compared to Fermi. More cores/node (36) should mean better OpenMP performance but also MPI performance will improve (faster network). Life much easier for SPMD programming models. cores/node 36 Memory/node 128 GB + Use SIMD vectorization

25 Single Instruction Multiple Data (SIMD) vectorization Technique for exploiting VLP on a single thread Operate on more than one element at a time Might decrease instruction counts significantly Elements are stored on SIMD registers or vectors Code needs to be vectorized Vectorization usually on inner loops Main and remainder loops are generated a[i:4] for (int i = 0; i < N; i++) c[i] = a[i] + b[i]; Scalar loop b[i:4] for (int i = 0; i < N; i += 4) c[i:4] = a[i:4] + b[i:4]; SIMD loop (4 elements) c[i:4]

26 A2 exploitation A2: KNL (More) similar to KNC coprocessor present on Galileo, but with remarkable differences: High bandwidth MCDRAM available AVX-512 ISA Binary compatible with standard Xeon MPI and OpenMP mixing still the best choice (less OpenMP performances dependent). cores/node Use SIMD vectorization Memory/node 96GByte DDR4 + 16GByte MCDRAM

27 A2: MCDRAM Memory bandwidth in HPC is one of common bottleneck for performances To increase the demand for memory bandwidth KNL have a on-package high memory bandwidth memory (HBM) based on multi-channel dynamic random access memory (MCDRAM). This memory is capable of delivering up to 5x performance ( 400 Gb/s) compared to DDR4 memory on same platform ( 90 GB/s)

28 A2: MCDRAM HBM on KNL can be used as 1. a last-level cache (LLC) 2. an addressable memory. The configuration is determined at boot time, by choosing in BIOS setting between three MCDRAM modes: 1. Flat mode 2. Cache mode 3. Hybrid mode

29 A2: MCDRAM The best mode to use will depend on the application.

30 A2: Using HBM as addressable memory Two methods for this: the numactl tool Works best if the whole app can fit in MCDRAM the memkind library Using library calls or Compiler Directives Needs source modification

31 A2: Using numactl to access MCDRAM Run "numactl --hardware" to see the NUMA configuration of your system Look for the node only. If the total memory footprint of your app is smaller than the size of MCDRAM ps -C myapp u see RSS value Use numactl to allocate all of its memory from MCDRAM numactl --membind=mcdram_id myapp Where mcdram_id is the ID of MCDRAM "node" If the total memory footprint of your app is larger than the size of MCDRAM You can still use numactl to allocate part of your app in MCDRAM numactl --preferred=mcdram_id myapp Allocations that don't fit into MCDRAM spills over to DDR numactl --interleave=nodes myapp Allocations are interleaved across all nodes

32 A2: Using Memkind to access MCDRAM Memkind library is a user-extensible heap manager built on top of jemalloc, a C library for general-purpose memory allocation functions. The library is generalizable to any NUMA architecture, but for Knights Landing processors it is used primarily for manual allocation to HBM using special allocators for C/C++ has limited support for Fortran

33 A2: Using Memkind - C case Allocate 1000 floats from DDR float *fv; fv = (float *)malloc(sizeof(float) * 1000); Allocate 1000 floats from MCDRAM float *fv; fv = (float *)hbw_malloc(sizeof(float) * 1000);

34 A2: Using Memkind - Fortran case C Declare arrays to be dynamic REAL, ALLOCATABLE :: A(:), B(:), C(:)!DEC$ ATTRIBUTES FASTMEM :: A NSIZE=1024 c c allocate array 'A' from MCDRAM c ALLOCATE (A(1:NSIZE)) c c Allocate arrays that will come from DDR c ALLOCATE (B(NSIZE), C(NSIZE))

35 A2 exploitation A2: KNL (More) similar to KNC coprocessor present on Galileo, but with remarkable differences: High bandwidth MCDRAM available AVX-512 ISA Binary compatible with standard Xeon MPI and OpenMP mixing still the best choice (less OpenMP performances dependent). cores/node Use SIMD vectorization Memory/node 96GByte DDR4 + 16GByte MCDRAM

36

37 Porting Applications onto Marconi A1 and A2 Applications developers must utilise vectorisation (SIMD) and/or MPI(OpenMP) processes(threads), and possibly the fast memory of KNL (MCDRAM). From the user perspective, its success will depend on how software tools and applications are able to exploit the KNL architecture. Key idea: use proxy (KNC, ): SPECFEM3D_GLOBE. Already reasonable results with KNC in the framework of the activity. Good amount of vectorisation (FORCE_VECTORIZATION preprocessing enabling and SIMD optimization) suitable for KNC and future KNL. High number of OpenMP threads scaling (up to more than 60 on KNC) Worth noting that up to now KNC haven t been widely supported by Geophysical Applications software developers and users. A remarkable exception is SPECFEM3D_GLOBE software CIG repo, where the native version is maintained and tested. Again, this should be fine for its KNL startup.

38 Computer system Good practice: vectorization (SPECFEM3D_GLOBE KNC investigation) SPECFEM3D_GLOBE Regional_MiddleEast test case: forward simulation Haswell (Galileo) KNC (Galileo) Computer System e.t. (sec.) e.t. (sec.) Speedup wrt Haswell SPECFEM3D_GLOBE Regional_MiddleEast test case: no vectorisation comparison Haswell (Galileo) KNC (Galileo) Slowdown factor wrt vectorised Based on a 4-node Galileo partition (16 MPI processes, 4 and 60 OpenMP threads on Haswell and KNC respectively). The impact of vectorisation: on Haswell and KNC respectively). <- 2x Slowdown factor

39 Good practice: choosing MCDRAM memory modes MCDRAM cache mode should be the good choice for most applications but applications with cache unfriendly data are candidates for using other memory modes. In this case: Try to identify what to put in MCDRAM: timings with selected data items allocated fast/slow (intrusive) memory profiling (VTUNE Amplifier tool)

40 Conclusions Marconi A1: moderate improvements over the years. but a big improvements compared to Fermi. High expectations of Marconi A2 KNL performances. KNC paves the way for increasing performances.try to manage domain parallelism, increase threading, exploit data parallelism (vectorisation) and improve data locality (new chance: use on package memory)

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers

More information

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian

INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past,

More information

Benchmark results on Knight Landing (KNL) architecture

Benchmark results on Knight Landing (KNL) architecture Benchmark results on Knight Landing (KNL) architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Roma 23/10/2017 KNL, BDW, SKL A1 BDW A2 KNL A3 SKL cores per node 2 x 18 @2.3

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System x idataplex CINECA, Italy Lenovo System

More information

Intel Architecture for HPC

Intel Architecture for HPC Intel Architecture for HPC Georg Zitzlsberger georg.zitzlsberger@vsb.cz 1st of March 2018 Agenda Salomon Architectures Intel R Xeon R processors v3 (Haswell) Intel R Xeon Phi TM coprocessor (KNC) Ohter

More information

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati,

HPC-CINECA infrastructure: The New Marconi System. HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, HPC-CINECA infrastructure: The New Marconi System HPC methods for Computational Fluid Dynamics and Astrophysics Giorgio Amati, g.amati@cineca.it Agenda 1. New Marconi system Roadmap Some performance info

More information

Welcome. Virtual tutorial starts at BST

Welcome. Virtual tutorial starts at BST Welcome Virtual tutorial starts at 15.00 BST Using KNL on ARCHER Adrian Jackson With thanks to: adrianj@epcc.ed.ac.uk @adrianjhpc Harvey Richardson from Cray Slides from Intel Xeon Phi Knights Landing

More information

Carlo Cavazzoni, HPC department, CINECA

Carlo Cavazzoni, HPC department, CINECA Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have

More information

Intel Knights Landing Hardware

Intel Knights Landing Hardware Intel Knights Landing Hardware TACC KNL Tutorial IXPUG Annual Meeting 2016 PRESENTED BY: John Cazes Lars Koesterke 1 Intel s Xeon Phi Architecture Leverages x86 architecture Simpler x86 cores, higher compute

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) PRACE 16th Call Technical Guidelines for Applicants V1: published on 26/09/17 TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 16 th CALL (T ier-0) The contributing sites and the corresponding computer systems

More information

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends

Outline. Motivation Parallel k-means Clustering Intel Computing Architectures Baseline Performance Performance Optimizations Future Trends Collaborators: Richard T. Mills, Argonne National Laboratory Sarat Sreepathi, Oak Ridge National Laboratory Forrest M. Hoffman, Oak Ridge National Laboratory Jitendra Kumar, Oak Ridge National Laboratory

More information

GPUs and Emerging Architectures

GPUs and Emerging Architectures GPUs and Emerging Architectures Mike Giles mike.giles@maths.ox.ac.uk Mathematical Institute, Oxford University e-infrastructure South Consortium Oxford e-research Centre Emerging Architectures p. 1 CPUs

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture

More information

Exascale: challenges and opportunities in a power constrained world

Exascale: challenges and opportunities in a power constrained world Exascale: challenges and opportunities in a power constrained world Carlo Cavazzoni c.cavazzoni@cineca.it SuperComputing Applications and Innovation Department CINECA CINECA non profit Consortium, made

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

n N c CIni.o ewsrg.au

n N c CIni.o ewsrg.au @NCInews NCI and Raijin National Computational Infrastructure 2 Our Partners General purpose, highly parallel processors High FLOPs/watt and FLOPs/$ Unit of execution Kernel Separate memory subsystem GPGPU

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility

More information

Introduction to tuning on KNL platforms

Introduction to tuning on KNL platforms Introduction to tuning on KNL platforms Gilles Gouaillardet RIST gilles@rist.or.jp 1 Agenda Why do we need many core platforms? KNL architecture Single-thread optimization Parallelization Common pitfalls

More information

Intel Xeon Phi архитектура, модели программирования, оптимизация.

Intel Xeon Phi архитектура, модели программирования, оптимизация. Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming

More information

HPC Cineca Infrastructure: State of the art and towards the exascale

HPC Cineca Infrastructure: State of the art and towards the exascale HPC Cineca Infrastructure: State of the art and towards the exascale HPC Methods for CFD and Astrophysics 13 Nov. 2017, Casalecchio di Reno, Bologna Ivan Spisso, i.spisso@cineca.it Contents CINECA in a

More information

The Intel Xeon PHI Architecture

The Intel Xeon PHI Architecture The Intel Xeon PHI Architecture (codename Knights Landing ) Dr. Christopher Dahnken Senior Application Engineer Software and Services Group Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT

More information

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori

More information

Introduction to tuning on many core platforms. Gilles Gouaillardet RIST

Introduction to tuning on many core platforms. Gilles Gouaillardet RIST Introduction to tuning on many core platforms Gilles Gouaillardet RIST gilles@rist.or.jp Agenda Why do we need many core platforms? Single-thread optimization Parallelization Conclusions Why do we need

More information

PRACE Project Access Technical Guidelines - 19 th Call for Proposals

PRACE Project Access Technical Guidelines - 19 th Call for Proposals PRACE Project Access Technical Guidelines - 19 th Call for Proposals Peer-Review Office Version 5 06/03/2019 The contributing sites and the corresponding computer systems for this call are: System Architecture

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 14 th CALL (T ier0) Contributing sites and the corresponding computer systems for this call are: GENCI CEA, France Bull Bullx cluster GCS HLRS, Germany Cray

More information

HPC Architectures. Types of resource currently in use

HPC Architectures. Types of resource currently in use HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Introduction to tuning on KNL platforms

Introduction to tuning on KNL platforms Introduction to tuning on KNL platforms Gilles Gouaillardet RIST gilles@rist.or.jp 1 Agenda Why do we need many core platforms? KNL architecture Post-K overview Single-thread optimization Parallelization

More information

HPC Architectures past,present and emerging trends

HPC Architectures past,present and emerging trends HPC Architectures past,present and emerging trends Andrew Emerson, Cineca a.emerson@cineca.it 27/09/2016 High Performance Molecular 1 Dynamics - HPC architectures Agenda Computational Science Trends in

More information

Benchmark results on Knight Landing architecture

Benchmark results on Knight Landing architecture Benchmark results on Knight Landing architecture Domenico Guida, CINECA SCAI (Bologna) Giorgio Amati, CINECA SCAI (Roma) Milano, 21/04/2017 KNL vs BDW A1 BDW A2 KNL cores per node 2 x 18 @2.3 GHz 1 x 68

More information

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D. Resources Current and Future Systems Timothy H. Kaiser, Ph.D. tkaiser@mines.edu 1 Most likely talk to be out of date History of Top 500 Issues with building bigger machines Current and near future academic

More information

Knights Corner: Your Path to Knights Landing

Knights Corner: Your Path to Knights Landing Knights Corner: Your Path to Knights Landing James Reinders, Intel Wednesday, September 17, 2014; 9-10am PDT Photo (c) 2014, James Reinders; used with permission; Yosemite Half Dome rising through forest

More information

Computer Architecture and Structured Parallel Programming James Reinders, Intel

Computer Architecture and Structured Parallel Programming James Reinders, Intel Computer Architecture and Structured Parallel Programming James Reinders, Intel Parallel Computing CIS 410/510 Department of Computer and Information Science Lecture 17 Manycore Computing and GPUs Computer

More information

Intel Many-Core Processor Architecture for High Performance Computing

Intel Many-Core Processor Architecture for High Performance Computing Intel Many-Core Processor Architecture for High Performance Computing Andrey Semin Principal Engineer, Intel IXPUG, Moscow June 1, 2017 Legal INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH

More information

Deep Learning with Intel DAAL

Deep Learning with Intel DAAL Deep Learning with Intel DAAL on Knights Landing Processor David Ojika dave.n.ojika@cern.ch March 22, 2017 Outline Introduction and Motivation Intel Knights Landing Processor Intel Data Analytics and Acceleration

More information

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins

Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications

More information

New HPC architectures landscape and impact on code developments Massimiliano Guarrasi, CINECA Meeting ICT INAF Catania, 13/09/2018

New HPC architectures landscape and impact on code developments Massimiliano Guarrasi, CINECA Meeting ICT INAF Catania, 13/09/2018 New HPC architectures landscape and impact on code developments Massimiliano Guarrasi, CINECA Meeting ICT INAF Catania, 13/09/2018 CINECA at glance Services both for Universities and Ministry Solutions

More information

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ, Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - fabio.baruffa@lrz.de LRZ, 27.6.- 29.6.2016 Architecture Overview Intel Xeon Processor Intel Xeon Phi Coprocessor, 1st generation Intel Xeon

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

IHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing

IHK/McKernel: A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing : A Lightweight Multi-kernel Operating System for Extreme-Scale Supercomputing Balazs Gerofi Exascale System Software Team, RIKEN Center for Computational Science 218/Nov/15 SC 18 Intel Extreme Computing

More information

HPC Architectures past,present and emerging trends

HPC Architectures past,present and emerging trends HPC Architectures past,present and emerging trends Author: Andrew Emerson, Cineca a.emerson@cineca.it Speaker: Alessandro Marani, Cineca a.marani@cineca.it Agenda Computational Science Trends in HPC technology

More information

Finite Element Integration and Assembly on Modern Multi and Many-core Processors

Finite Element Integration and Assembly on Modern Multi and Many-core Processors Finite Element Integration and Assembly on Modern Multi and Many-core Processors Krzysztof Banaś, Jan Bielański, Kazimierz Chłoń AGH University of Science and Technology, Mickiewicza 30, 30-059 Kraków,

More information

Performance Optimization of Smoothed Particle Hydrodynamics for Multi/Many-Core Architectures

Performance Optimization of Smoothed Particle Hydrodynamics for Multi/Many-Core Architectures Performance Optimization of Smoothed Particle Hydrodynamics for Multi/Many-Core Architectures Dr. Fabio Baruffa Dr. Luigi Iapichino Leibniz Supercomputing Centre fabio.baruffa@lrz.de Outline of the talk

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing

Programming Models for Multi- Threading. Brian Marshall, Advanced Research Computing Programming Models for Multi- Threading Brian Marshall, Advanced Research Computing Why Do Parallel Computing? Limits of single CPU computing performance available memory I/O rates Parallel computing allows

More information

Parallel Accelerators

Parallel Accelerators Parallel Accelerators Přemysl Šůcha ``Parallel algorithms'', 2017/2018 CTU/FEL 1 Topic Overview Graphical Processing Units (GPU) and CUDA Vector addition on CUDA Intel Xeon Phi Matrix equations on Xeon

More information

arxiv: v2 [hep-lat] 3 Nov 2016

arxiv: v2 [hep-lat] 3 Nov 2016 MILC staggered conjugate gradient performance on Intel KNL arxiv:1611.00728v2 [hep-lat] 3 Nov 2016 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli@umail.iu.edu Carleton

More information

Performance and Energy Usage of Workloads on KNL and Haswell Architectures

Performance and Energy Usage of Workloads on KNL and Haswell Architectures Performance and Energy Usage of Workloads on KNL and Haswell Architectures Tyler Allen 1 Christopher Daley 2 Doug Doerfler 2 Brian Austin 2 Nicholas Wright 2 1 Clemson University 2 National Energy Research

More information

Bei Wang, Dmitry Prohorov and Carlos Rosales

Bei Wang, Dmitry Prohorov and Carlos Rosales Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512

More information

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor

IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor IFS RAPS14 benchmark on 2 nd generation Intel Xeon Phi processor D.Sc. Mikko Byckling 17th Workshop on High Performance Computing in Meteorology October 24 th 2016, Reading, UK Legal Disclaimer & Optimization

More information

GPU Architecture. Alan Gray EPCC The University of Edinburgh

GPU Architecture. Alan Gray EPCC The University of Edinburgh GPU Architecture Alan Gray EPCC The University of Edinburgh Outline Why do we want/need accelerators such as GPUs? Architectural reasons for accelerator performance advantages Latest GPU Products From

More information

Scientific Computing with Intel Xeon Phi Coprocessors

Scientific Computing with Intel Xeon Phi Coprocessors Scientific Computing with Intel Xeon Phi Coprocessors Andrey Vladimirov Colfax International HPC Advisory Council Stanford Conference 2015 Compututing with Xeon Phi Welcome Colfax International, 2014 Contents

More information

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich

JÜLICH SUPERCOMPUTING CENTRE Site Introduction Michael Stephan Forschungszentrum Jülich JÜLICH SUPERCOMPUTING CENTRE Site Introduction 09.04.2018 Michael Stephan JSC @ Forschungszentrum Jülich FORSCHUNGSZENTRUM JÜLICH Research Centre Jülich One of the 15 Helmholtz Research Centers in Germany

More information

Reusing this material

Reusing this material XEON PHI BASICS Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us

More information

Intel Many Integrated Core (MIC) Architecture

Intel Many Integrated Core (MIC) Architecture Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products

More information

OP2 FOR MANY-CORE ARCHITECTURES

OP2 FOR MANY-CORE ARCHITECTURES OP2 FOR MANY-CORE ARCHITECTURES G.R. Mudalige, M.B. Giles, Oxford e-research Centre, University of Oxford gihan.mudalige@oerc.ox.ac.uk 27 th Jan 2012 1 AGENDA OP2 Current Progress Future work for OP2 EPSRC

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation

CUDA Accelerated Linpack on Clusters. E. Phillips, NVIDIA Corporation CUDA Accelerated Linpack on Clusters E. Phillips, NVIDIA Corporation Outline Linpack benchmark CUDA Acceleration Strategy Fermi DGEMM Optimization / Performance Linpack Results Conclusions LINPACK Benchmark

More information

Vectorisation and Portable Programming using OpenCL

Vectorisation and Portable Programming using OpenCL Vectorisation and Portable Programming using OpenCL Mitglied der Helmholtz-Gemeinschaft Jülich Supercomputing Centre (JSC) Andreas Beckmann, Ilya Zhukov, Willi Homberg, JSC Wolfram Schenck, FH Bielefeld

More information

DATARMOR: Comment s'y préparer? Tina Odaka

DATARMOR: Comment s'y préparer? Tina Odaka DATARMOR: Comment s'y préparer? Tina Odaka 30.09.2016 PLAN DATARMOR: Detailed explanation on hard ware What can you do today to be ready for DATARMOR DATARMOR : convention de nommage ClusterHPC REF SCRATCH

More information

HPC code modernization with Intel development tools

HPC code modernization with Intel development tools HPC code modernization with Intel development tools Bayncore, Ltd. Intel HPC Software Workshop Series 2016 HPC Code Modernization for Intel Xeon and Xeon Phi February 17 th 2016, Barcelona Microprocessor

More information

Parallel Accelerators

Parallel Accelerators Parallel Accelerators Přemysl Šůcha ``Parallel algorithms'', 2017/2018 CTU/FEL 1 Topic Overview Graphical Processing Units (GPU) and CUDA Vector addition on CUDA Intel Xeon Phi Matrix equations on Xeon

More information

Towards modernisation of the Gadget code on many-core architectures Fabio Baruffa, Luigi Iapichino (LRZ)

Towards modernisation of the Gadget code on many-core architectures Fabio Baruffa, Luigi Iapichino (LRZ) Towards modernisation of the Gadget code on many-core architectures Fabio Baruffa, Luigi Iapichino (LRZ) Overview Modernising P-Gadget3 for the Intel Xeon Phi : code features, challenges and strategy for

More information

Parallel and Distributed Programming Introduction. Kenjiro Taura

Parallel and Distributed Programming Introduction. Kenjiro Taura Parallel and Distributed Programming Introduction Kenjiro Taura 1 / 21 Contents 1 Why Parallel Programming? 2 What Parallel Machines Look Like, and Where Performance Come From? 3 How to Program Parallel

More information

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing

TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing TACC s Stampede Project: Intel MIC for Simulation and Data-Intensive Computing Jay Boisseau, Director April 17, 2012 TACC Vision & Strategy Provide the most powerful, capable computing technologies and

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center The Stampede is Coming Welcome to Stampede Introductory Training Dan Stanzione Texas Advanced Computing Center dan@tacc.utexas.edu Thanks for Coming! Stampede is an exciting new system of incredible power.

More information

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers

Tutorial. Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Tutorial Preparing for Stampede: Programming Heterogeneous Many-Core Supercomputers Dan Stanzione, Lars Koesterke, Bill Barth, Kent Milfeld dan/lars/bbarth/milfeld@tacc.utexas.edu XSEDE 12 July 16, 2012

More information

Enhancing Application Performance using Heterogeneous Memory Architectures on the Many-core Platform

Enhancing Application Performance using Heterogeneous Memory Architectures on the Many-core Platform Enhancing Application Performance using Heterogeneous Memory Architectures on the Many-core Platform Shuo Li, Karthik Raman, Ruchira Sasanka, Andrey Semin Presented by James Tullos IXPUG US Annual Meeting

More information

Preparing for Highly Parallel, Heterogeneous Coprocessing

Preparing for Highly Parallel, Heterogeneous Coprocessing Preparing for Highly Parallel, Heterogeneous Coprocessing Steve Lantz Senior Research Associate Cornell CAC Workshop: Parallel Computing on Ranger and Lonestar May 17, 2012 What Are We Talking About Here?

More information

1. Many Core vs Multi Core. 2. Performance Optimization Concepts for Many Core. 3. Performance Optimization Strategy for Many Core

1. Many Core vs Multi Core. 2. Performance Optimization Concepts for Many Core. 3. Performance Optimization Strategy for Many Core 1. Many Core vs Multi Core 2. Performance Optimization Concepts for Many Core 3. Performance Optimization Strategy for Many Core 4. Example Case Studies NERSC s Cori will begin to transition the workload

More information

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include

Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include 3.1 Overview Accelerator cards are typically PCIx cards that supplement a host processor, which they require to operate Today, the most common accelerators include GPUs (Graphics Processing Units) AMD/ATI

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Non-uniform memory access (NUMA)

Non-uniform memory access (NUMA) Non-uniform memory access (NUMA) Memory access between processor core to main memory is not uniform. Memory resides in separate regions called NUMA domains. For highest performance, cores should only access

More information

Introduction of Oakforest-PACS

Introduction of Oakforest-PACS Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS

More information

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES

COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:

More information

Umeå University

Umeå University HPC2N @ Umeå University Introduction to HPC2N and Kebnekaise Jerry Eriksson, Pedro Ojeda-May, and Birgitte Brydsö Outline Short presentation of HPC2N HPC at a glance. HPC2N Abisko, Kebnekaise HPC Programming

More information

Cray XC Scalability and the Aries Network Tony Ford

Cray XC Scalability and the Aries Network Tony Ford Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?

More information

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture

Double Rewards of Porting Scientific Applications to the Intel MIC Architecture Double Rewards of Porting Scientific Applications to the Intel MIC Architecture Troy A. Porter Hansen Experimental Physics Laboratory and Kavli Institute for Particle Astrophysics and Cosmology Stanford

More information

Umeå University

Umeå University HPC2N: Introduction to HPC2N and Kebnekaise, 2017-09-12 HPC2N @ Umeå University Introduction to HPC2N and Kebnekaise Jerry Eriksson, Pedro Ojeda-May, and Birgitte Brydsö Outline Short presentation of HPC2N

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt

More information

Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi

Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi Performance optimization of the Smoothed Particle Hydrodynamics code Gadget3 on 2nd generation Intel Xeon Phi Dr. Luigi Iapichino Leibniz Supercomputing Centre Supercomputing 2017 Intel booth, Nerve Center

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Multi-core today: Intel Xeon 600v4 (016) Xeon E5-600v4 Broadwell

More information

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation

ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation ANSYS Improvements to Engineering Productivity with HPC and GPU-Accelerated Simulation Ray Browell nvidia Technology Theater SC12 1 2012 ANSYS, Inc. nvidia Technology Theater SC12 HPC Revolution Recent

More information

Efficient Parallel Programming on Xeon Phi for Exascale

Efficient Parallel Programming on Xeon Phi for Exascale Efficient Parallel Programming on Xeon Phi for Exascale Eric Petit, Intel IPAG, Seminar at MDLS, Saclay, 29th November 2016 Legal Disclaimers Intel technologies features and benefits depend on system configuration

More information

Going parallel with efficiency on the future Knights family

Going parallel with efficiency on the future Knights family Going parallel with efficiency on the future Knights family CJ Newburn, SW/HW HPC Architect, Community Builder Nov 18, 2015 SC15 IXPUG BoF: Paving the way for Performance on Intel Knights Landing Processors

More information

What s P. Thierry

What s P. Thierry What s new@intel P. Thierry Principal Engineer, Intel Corp philippe.thierry@intel.com CPU trend Memory update Software Characterization in 30 mn 10 000 feet view CPU : Range of few TF/s and

More information

Advances of parallel computing. Kirill Bogachev May 2016

Advances of parallel computing. Kirill Bogachev May 2016 Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Moore s law Intel Sandy Bridge EP: 2.3 billion Nvidia

More information

Leibniz Supercomputer Centre. Movie on YouTube

Leibniz Supercomputer Centre. Movie on YouTube SuperMUC @ Leibniz Supercomputer Centre Movie on YouTube Peak Performance Peak performance: 3 Peta Flops 3*10 15 Flops Mega 10 6 million Giga 10 9 billion Tera 10 12 trillion Peta 10 15 quadrillion Exa

More information

Comparing Performance and Power Consumption on Different Architectures

Comparing Performance and Power Consumption on Different Architectures Comparing Performance and Power Consumption on Different Architectures Andriani Mappoura August 18, 2017 MSc in High Performance Computing The University of Edinburgh Year of Presentation: 2017 Abstract

More information

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0) Contributing sites and the corresponding computer systems for this call are: BSC, Spain IBM System X idataplex CINECA, Italy The site selection

More information

Performance of deal.ii on a node

Performance of deal.ii on a node Performance of deal.ii on a node Bruno Turcksin Texas A&M University, Dept. of Mathematics Bruno Turcksin Deal.II on a node 1/37 Outline 1 Introduction 2 Architecture 3 Paralution 4 Other Libraries 5 Conclusions

More information

Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ,

Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ, Intel MIC Programming Workshop, Hardware Overview & Native Execution LRZ, 27.6.- 29.6.2016 1 Agenda Intro @ accelerators on HPC Architecture overview of the Intel Xeon Phi Products Programming models Native

More information

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product

More information

Advanced Research Computing. ARC3 and GPUs. Mark Dixon

Advanced Research Computing. ARC3 and GPUs. Mark Dixon Advanced Research Computing Mark Dixon m.c.dixon@leeds.ac.uk ARC3 (1st March 217) Included 2 GPU nodes, each with: 24 Intel CPU cores & 128G RAM (same as standard compute node) 2 NVIDIA Tesla K8 24G RAM

More information

The Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation

The Intel Xeon Phi Coprocessor. Dr-Ing. Michael Klemm Software and Services Group Intel Corporation The Intel Xeon Phi Coprocessor Dr-Ing. Michael Klemm Software and Services Group Intel Corporation (michael.klemm@intel.com) Legal Disclaimer & Optimization Notice INFORMATION IN THIS DOCUMENT IS PROVIDED

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

OpenFOAM on BG/Q porting and performance

OpenFOAM on BG/Q porting and performance OpenFOAM on BG/Q porting and performance Paride Dagna, SCAI Department, CINECA SYSTEM OVERVIEW OpenFOAM : selected application inside of PRACE project Fermi : PRACE Tier- System Model: IBM-BlueGene /Q

More information