ARM High Performance Computing

Similar documents
Enabling the ARM high performance computing (HPC) software ecosystem

Bootstrapping a HPC Ecosystem

ARM BOF. Jay Kruemcke Sr. Product Manager, HPC, ARM,

Jay Kruemcke Sr. Product Manager, HPC, Arm,

Arm's role in co-design for the next generation of HPC platforms

Intel HPC Orchestrator System Software Stack Providing Key Building Blocks for Intel Scalable System Framework

OpenHPC: Project Overview and Updates

The Arm Technology Ecosystem: Current Products and Future Outlook

Arm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited

The Mont-Blanc project Updates from the Barcelona Supercomputing Center

ARM Performance Libraries Current and future interests

Arm crossplatform. VI-HPS platform October 16, Arm Limited

Beyond Hardware IP An overview of Arm development solutions

Software Ecosystem for Arm-based HPC

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED

Post-K: Building the Arm HPC Ecosystem

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Barcelona Supercomputing Center

Arm Processor Technology Update and Roadmap

Butterfly effect of porting scientific applications to ARM-based platforms

SUSE Linux Entreprise Server for ARM

HPC Network Stack on ARM

Arm's role in co-design for the next generation of HPC platforms

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Transforming the Data Center with ARM

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED

Basic Specification of Oakforest-PACS

Toward Building up ARM HPC Ecosystem

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Arm in High Performance Computing: Fortran on AArch64

The challenge of SVE in QEMU. Alex Bennée Senior Virtualization Engineer

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities--

FUJITSU HPC and the Development of the Post-K Supercomputer

GOING ARM A CODE PERSPECTIVE

Building supercomputers from embedded technologies

Cray RS Programming Environment

ARMv8-A Scalable Vector Extension for Post-K. Copyright 2016 FUJITSU LIMITED

The Mont-Blanc Project

Building supercomputers from commodity embedded chips

TOSS - A RHEL-based Operating System for HPC Clusters

Introduction of Oakforest-PACS

Linux HPC Software Stack

Trends in HPC (hardware complexity and software challenges)

Fujitsu High Performance CPU for the Post-K Computer

Webinar Tips and Tricks for Porting HPC Apps

Meeting of the Technical Steering Committee (TSC) Board

April 2 nd, Bob Burroughs Director, HPC Solution Sales

Overview of the Post-K processor

ARM processors driving automotive innovation

Intel Math Kernel Library 10.3

Our new HPC-Cluster An overview

IDC White Paper: Creating World Leading Systems Using a Common Processor Microarchitecture: Combining the Best from Mainframes, UNIX Servers, and HPC

Code-Agnostic Performance Characterisation and Enhancement

Meeting of the Technical Steering Committee (TSC) Board

Red Hat HPC Solution Overview. Platform Computing

Pedraforca: a First ARM + GPU Cluster for HPC

ARM Ltd. ! Founded in November 1990! Spun out of Acorn Computers

European energy efficient supercomputer project

HPC Network Stack on Arm Pavel Shamis/Pasha Principal Research Engineer

High Performance Computing Software Development Kit For Mac OS X In Depth Product Information

Running Applications on The Sheffield University HPC Clusters

Linear Algebra libraries in Debian. DebConf 10 New York 05/08/2010 Sylvestre

The Mont-Blanc approach towards Exascale

Use Cases and Best Practices Primer for SUSE and ARM

Fujitsu s new supercomputer, delivering the next step in Exascale capability

Microprocessors, Lecture 1: Introduction to Microprocessors

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

How to get Access to Shaheen2? Bilel Hadri Computational Scientist KAUST Supercomputing Core Lab

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

User Training Cray XC40 IITM, Pune

Linux User Environment. Robert McLay

Amber Baruffa Vincent Varouh

Intel Performance Libraries

Maximize Performance and Scalability of RADIOSS* Structural Analysis Software on Intel Xeon Processor E7 v2 Family-Based Platforms

ARM instruction sets and CPUs for wide-ranging applications

Dynamic Selection of Auto-tuned Kernels to the Numerical Libraries in the DOE ACTS Collection

A sneak peek into SVE and VLA programming

Mathematical Libraries and Application Software on JUQUEEN and JURECA

Solutions for Scalable HPC

Code Optimization. Brandon Barker Computational Scientist Cornell University Center for Advanced Computing (CAC)

Characterization of Native Signal Processing Extensions

Bringing Intelligence to Enterprise Storage Drives

ARMv8 Micro-architectural Design Space Exploration for High Performance Computing using Fractional Factorial

OpenCL Vectorising Features. Andreas Beckmann

Crossing the Architectural Barrier: Evaluating Representative Regions of Parallel HPC Applications

Optimization and Scalability

Each Milliwatt Matters

ABySS Performance Benchmark and Profiling. May 2010

ICON Performance Benchmark and Profiling. March 2012

Programming for Fujitsu Supercomputers

SIMD Exploitation in (JIT) Compilers

Intel Advisor XE Future Release Threading Design & Prototyping Vectorization Assistant

EEM870 Embedded System and Experiment Lecture 3: ARM Processor Architecture

Mixed MPI-OpenMP EUROBEN kernels

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

The Changing Face of Edge Compute

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

Intel Many Integrated Core (MIC) Architecture

Ranger Optimization Release 0.3

Transcription:

ARM High Performance Computing Eric Van Hensbergen Distinguished Engineer, Director HPC Software & Large Scale Systems Research IDC HPC Users Group Meeting Austin, TX September 8, 2016 ARM 2016

An introduction to ARM ARM is the world's leading semiconductor intellectual property supplier. We license to over 440 partners, are present in 95% of smart phones, 80% of digital cameras, 35% of all electronic devices, and a total of 72 billion ARM cores have been shipped since 1990. Our CPU business model: License technology to partners, who use it to create their own system-on-chip (SoC) products. We may license an ISA (e.g. ARMv8-A ) or a specific microarchitectural implementation (e.g. Cortex-A72 ) and our IP extends beyond the CPU 2 ARM 2016

Why ARM in HPC? Energy efficiency Choice Energy has always been a first-class design constraint at ARM, the goal in our uarchitecture IP is to maintain that power efficiency advantage as we increase performance. The CPU core is only one element of the equation, we are also working to explore efficiency in the memory and on-chip interconnect space. Independent silicon providers within a shared software ecosystem. Wide range of price/performance/power/size options available. Customisation ARM s partners can build SoCs very quickly. Getting interesting in the US: SoC for HPC: http://www.socforhpc.org DARPA CRAFT: http://www.darpa.mil/news-events/2015-08-17 3 ARM 2016

HPC Leadership: International Exascale Programs United States Data Movement Dominates FastForward II PathForward (proposals) Japan ARM based Fujitsu Post-K supercomputer will be deployed to RIKEN European Union Montblanc 1, 2, & 3 ExaNode Hartree Centre Deployment H2020-FET-HPC (proposals) China Multiple licensees pursuing HPC oriented systems 4 ARM 2016

Expanding ARMv8 vector processing ARMv7 Advanced SIMD (aka ARM NEON instructions) now 12 years old Integer, fixed-point and non-ieee single-precision float, on well-conditioned data 16 128-bit vector registers AArch64 Advanced SIMD was an evolution Gained full IEEE double-precision float and 64-bit integer vector ops Vector register file grew from 16 128b to 32 128b New markets for ARMv8-A are demanding more radical changes Gather load & Scatter store Per-lane predication Longer vectors But what is the preferred vector length? 5 ARM 2016

Introducing the Scalable Vector Extension (SVE) There is no preferred vector length Vector Length (VL) is hardware choice, from 128 to 2048 bits, in increments of 128 Vector Length Agnostic (VLA) programming adjusts dynamically to the available VL No need to recompile, or to rewrite hand-coded SVE assembler or C intrinsics SVE is not an extension of Advanced SIMD A separate architectural extension with a new set of A64 instruction encodings Focus is HPC scientific workloads, not media/image processing Amdahl says you need high vector utilisation to achieve significant speedups Compilers often unable to vectorize due to intra-vector data & control dependencies SVE also begins to address some of the traditional barriers to auto-vectorization 6 ARM 2016

ARM Performance Libraries Enable the wide variety of ARM cores available today without adding complexity to the software ecosystem. Commercially supported 64-bit ARMv8 vendor math libraries for scientific computing. Built and validated using technology from the Numerical Algorithms Group (NAG). ARM silicon partners provide us with tuned kernels. Capabilities: BLAS LAPACK FFT Tuned for: Cortex-A57, Applied Micro X-Gene Cavium ThunderX 7 ARM 2016

ARM & OpenHPC Participation: Silver member of OpenHPC ARM is on OpenHPC TSC in order to drive ARM architecture build support OpenHPC defines a clear target baseline of codes, many of which have already ported and are available on ARM. See developer.arm.com/hpc for details. Status: 131 out of 166 packages building (79%) Closing on fixing remaining relevant packages and moving towards continous build integration on multiple ARM platforms Functional Areas Components Base OS RHEL/CentOS 7.1, SLES 12 Administrative Tools Provisioning Resource Mgmt. I/O Services Numerical/Scientific Libraries I/O Libraries Compiler Families MPI Families Development Tools Performance Tools Conman, Ganglia, Lmod, LosF, ORCM, Nagios, pdsh, prun Warewulf SLURM, Munge. Altair PBS Pro* Lustre client (community version) Boost, GSL, FFTW, Metis, PETSc, Trilinos, Hypre, SuperLU, Mumps HDF5 (phdf5), NetCDF (including C++ and Fortran interfaces), Adios GNU (gcc, g++, gfortran) OpenMPI, MVAPICH2 Autotools (autoconf, automake, libtool), Valgrind,R, SciPy/NumPy PAPI, Intel IMB, mpip, pdtoolkit TAU 8 ARM 2016

International Standards 9 ARM 2016

Developer website : arm.com/hpc ARM have launched an HPC-specific microsite This is home to our HPC ecosystem offering: technical reference material how-to guides latest news and updates from partners downloads of HPC libraries third-party software recommendations web forum for community discussion and help Participate and help drive the community 10 ARM 2016