Arm's role in co-design for the next generation of HPC platforms

Size: px
Start display at page:

Download "Arm's role in co-design for the next generation of HPC platforms"

Transcription

1 Arm's role in co-design for the next generation of HPC platforms Arm HPC workshop at SJTU July 2018 Filippo Spiga Software and Large Scale Systems

2 Outline Arm and High Performance Computing Introducing SVE Simulation Tools: gem5 and DynamoRIO+ARMIE SVE Ecosystem Full Application Enablement 2

3 Arm Technology Already Connects the World Arm is Ubiquitous 21 billion chips sold by partners in 2017 alone Mobile/Embedded/IoT/ Automotive/Server/GPUs Partnership is key We design IP, not manufacture chips Partners build products for their target markets Arm is Ubiquitous One size is not always the best fit for all HPC is a great fit for codesign and collaboration 4

4 The Arm Business Model Deploy energy-efficient Arm-based technology, wherever computing happens Leading in wearables and the Internet of Things ~85% share of laptops, tablets, and smartphones Driving the transformation of the network and data center to an Intelligent Flexible Cloud Enabling innovation and creativity with embedded intelligence Taking mobile computing to the next four billion people Partnering to deliver data center efficiency 6

5 Arm Cortex Family Cortex - A Highest performance Optimised for rich operating systems Cortex - R Fast response Optimised for high performance, hard real-time applications Cortex - M Smallest/lowest power Optimised for embedded intelligence and microcontrollers 8

6 Arm application processors are everywhere Cortex-A CPUs cover a wide variety of markets Scale efficiently to substantially higher performance Fit even more compute in a smaller footprint with less power Mobile and Consumer Automotive Servers and networking IoT and Embedded 9

7 Introducing Arm Research Mission Partner to accelerate innovation and transfer research knowledge across Arm and the Arm Ecosystem Objectives Build a pipeline to create and bring future technology into the Arm Ecosystem Create and maintain the emerging technology landscape Enable innovative Academic research through collaboration and partnership Strategic Research Agendas Research Programs Technology Landscapes Collaboration Programs 10

8 Arm Research to End User Research ARM IP SOC OEM Service 5-10 years lead time 12

9 Arm Research Groups Architectures Systems and Memory Design Methodology Software and Large Scale Systems New Materials, Devices and Circuits IoT and Energy-Constrained Systems (+ Security) 14

10 Arm High Performance Computing strategy Enablement Address gaps in computational capability and data movement within Architecture Seed the software ecosystem with open source support for Armv8 and SVE libraries, tools, and optimized workloads Provide world class tools for compilation, analysis, and debug at large scale. Mission Enable the world s first Arm supercomputer(s) Strategy Enablement + Co-Design + Partnership Strategy: Enablement + Co-Design + Partnership Building Blocks Co-Design Work with key end-customers in DoE, DoD, RIKEN, and EU to design balanced architecture, uarchitecture and SoCs based on real-world workloads, not benchmarks. Develop simulation and modelling tools to support co-design development with end-customers, partners, and academia. Partnership Work with Architecture partners to quickly bring optimized solutions to market. Work with ATG & uarchitecture design teams to steer future designs to be more relevant for HPC, HPDA, and ML Work with key ISVs to enable mid-market 17

11 What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization that future computing environments will be significantly different from those that provide Petascale capabilities. This change is driven by energy constraints, which is compelling architects to design systems that will require a significant re-thinking of how algorithms are developed and implemented. Co-design has been proposed as a methodology for scientific application, software and hardware communities to work together. This chapter gives an overview of co-design and discusses the early application of this methodology to HighPerformance Computing. [ ] The co-design strategy is based on developing partnerships with computer vendors and application scientists and engaging them in a highly collaborative and iterative design process well before a given system is available for commercial use. The process is built around identifying leading edge, high-impact scientific applications and providing concrete optimization targets rather than focusing on speeds andfeeds (FLOPs andbandwidth) andpercent ofpeak. On the Role of Co-design in High Performance Computing, Transition of HPC Towards Exascale Computing (2013) 18

12 Co-design with Arm Analysis of applications to devise the most efficient solutions Applications System Software & Tools Platform Hardware (the implementation ) Architecture Opportunities to exploit 19

13 History of Arm in HPC 2011 Calxada 32-bit ARMv7-A Cortex A AMD Opteron A bit ARMv8-A Cortex A Cores 2017 Cavium ThunderX 2 64-bit ARMv8-A 32 Cores Mont-Blanc 1 32-bit ARMv7-A Cortex A15 First Arm HPC cluster 2015 Cavium ThunderX 64-bit ARMv8-A 48 Cores 20

14 Introducing SVE

15 Performance Portability 1 st source of frustration for Applications Developers Start thinking massively parallel easy to say, hard in practice! Exploit massive parallelism without over-specifying the details Programmer may be able to provide hints to a compiler/runtime but how good in self-discover and self-tuning? Exploit heterogeneous compute elements is hard MPI+X still good enough? Will we ever move beyond it? One programming language to rule them all? Handling efficient data sharing across components The CPU has everything we need, can we (Arm) make CPU architecture better? 30

16 Evolution of Arm SIMD architectures Single Instruction Multiple Data = data level parallelism, many operations per instruction Armv6 SIMD bit integer/core register file Integer only 2x16-bit fixed-point data elements Armv7-A Advanced SIMD (aka Arm NEON) bit SIMD register file, supporting well-conditioned memory data layouts Non-IEEE single-precision floating-point and 8/16/32-bit fixed-point data elements Armv8-A AArch64 Advanced SIMD was an evolutionary step b SIMD register file, identical memory data layouts Full IEEE double-precision floating-point and 64-bit fixed-point data elements But new markets for AArch64 (HPC) have demanded more radical features Ability to vectorize irregular code and more complex data structures Longer vectors to extract more data-level parallelism per cycle? 31

17 Scalable Vector Extension (SVE) SVE does not mandate a single, fixed vector length Vector Length (VL) is hardware implementation choice of 128 to 2048 bits Vector Length Agnostic (VLA) programming paradigm made possible by the per-lane predication, predicate-driven loop control, vector partitioning and software-managed speculation SVE is not a simple extension of AArch64 Advanced SIMD A separate, optional architectural extension with a new set of instruction encodings (ARMv8.3) Initial focus is HPC and general-purpose server software, not media/image processing SVE begins to address traditional barriers to auto-vectorization Compilers often cannot vectorize due to intra-vector control and data dependencies Software-managed speculative vectorization allows more loops to be vectorized by a compiler 32

18 Introducing the Scalable Vector Extension (SVE) A vector extension to the ARMv8-A architecture with some major new features: Gather-load and scatter-store Loads a single register from several non-contiguous memory locations. + pred = Per-lane predication Operations work on individual lanes under control of a predicate register. for (i = 0; i < n; ++i) INDEX i n-2 n-1 n n+1 CMPLT n Predicate-driven loop control and management Eliminate scalar loop heads and tails by processing partial vectors. 33

19 Introducing the Scalable Vector Extension (SVE) A vector extension to the ARMv8-A architecture with some major new features: + pred Vector partitioning and software-managed speculation First Faulting Load instructions allow memory accesses to cross into invalid pages = = = = Extended floating-point horizontal reductions In-order and tree-based reductions trade-off performance and repeatability. 34

20 Example:daxpy (scalar) void daxpy(double *x, double *y, double a, int n) { for (int i = 0; i < n; i++) { y[i] = a * x[i] + y[i]; } } // x0 = &x[0] // x1 = &y[0] // x2 = &a // x3 = &n daxpy_: ldrsw x3, [x3] mov x4, #0 ldr d0, [x2] b.latch.loop: ldr d1, [x0, x4, lsl #3] ldr d2, [x1, x4, lsl #3] fmadd d2, d1, d0, d2 str d2, [x1, x4, lsl #3] add x4, x4, #1.latch: cmp x4, x3 b.lt.loop ret 36

21 daxpy (SVE) daxpy (scalar) daxpy_:.loop:.latch: ldrsw x3, [x3] mov x4, #0 whilelt p0.d, x4, x3 ld1rd z0.d, p0/z, [x2] ld1d z1.d, p0/z, [x0, x4, lsl #3] ld1d z2.d, p0/z, [x1, x4, lsl #3] fmla z2.d, p0/m, z1.d, z0.d st1d z2.d, p0, [x1, x4, lsl #3] incd x4 whilelt b.first ret p0.d, x4, x3.loop daxpy_:.loop:.latch: ldrsw x3, [x3] mov x4, #0 ldr d0, [x2] b.latch ldr d1, [x0, x4, lsl #3] ldr d2, [x1, x4, lsl #3] fmadd d2, d1, d0, d2 str d2, [x1, x4, lsl #3] add x4, x4, #1 cmp b.lt ret x4, x3.loop Q1: How do we handle the non-multiples of VL? Q2: What happens at different vector lengths? check bonus slides 37

22 Vector Architecture Design Trade-offs SLC SLC MC MC Example of Cache Coherent Vector Microarchitecture Cores with one or more SVE and Load/Store (LS) unit(s) NoC Private L1 and L2 caches (considered part of the core) System-level cache (SLC) shared among cores L2 Memory controllers (MC) L1 Network on chip (NoC) interconnects cores, SLCs and MCs SVE LS Core Core Disclaimer: Logical representation, not representative of a physical implementation 38

23 Simulation Tools: gem5 and DynamoRIO+ARMIE

24 gem5 gem5: open-source sophisticated processor and computer system simulator Runs real workloads Analyze workloads that customers use and care about including complex workloads such as Android Comprehensive model library Memory and I/O devices Full OS, Web browsers Rapid early prototyping But not a microarchitectural model out of the box! Ubuntu (Linux 4.x) Android Nougat New ideas can be tested quickly System-level impact can be quantified System-level insights Enables us to study complex memory-system 44

25 When Not to Use gem5 Performance validation gem5 is not a cycle-accurate microarchitecture model for any specific product This typically requires more accurate models such as RTL simulation. Commercial products such as Arm CycleModels operate in this space. Core microarchitecture exploration Only do this if you have a custom, detailed, CPU model! gem5 s core models were not designed to replace more accurate microarchitectural models. To validate functional correctness or test bleeding-edge ISA improvements gem5 is not as rigorously tested as commercial products. New (ARMv8.0+) or optional instructions are sometimes not implemented Commercial products such as Arm FastModels offer better reliability in this space. 45

26 Arm Research Enablement Kit System Modeling Using gem5 System-on-chip (SoC) systems are complex and expensive to prototype Simulation is a cost-effective way to evaluate new ideas gem5 models various system components, such as different CPU models, interconnection topologies, memory subsystems, etc. gem5 also models the interactions among system components Arm Research Enablement Kit is a guide through Arm-based system modeling using the gem5 simulator and a 64-bit processor model based on Armv8-A. 46

27 SVE Support in gem5 Commitment from Arm Research to open-source full support for SVE in gem5, with the goal of enabling research around SVE (and vector architectures in general) Main contributors: Giacomo Gabrielli Rekai Gonzalez Alberquilla Javier Setoain Gabor Dosza Still technically a work-in-progress, but most of the functionality is in place! 47

28 Where Do I Get It From? > git clone > git checkout sve/beta1 sve/beta1 is the latest development branch the plan is to merge all changes onto master after external review process Check README_SVE.md for up-to-date list of known issues and limitations 48

29 DynamoRIO Open-source BSD project Support for AArch32, AArch64 (added by Arm Research), IA-32, and x86-64 Enable runtime code manipulation system that supports code transformations on any part of a program, while it executes. Provides utilities for memory address tracing Opportunity for custom instrumentations via clients 49

30 DynamoRIO + ArmIE SVE DynamoRIO does not offer (yet!) natively SVE support Arm Instruction Emulator (ArmIE) is used to execute and instrument any SVE instructions Memory mapped shared counter ensures output correctness for traces 3 Clients currently supported: Memory Tracing Instruction Tracing Instruction Count Individual traces can be merged and analyze via additional post-processing scripts DynamoRIO ArmIE SVE binary SVE instrumentation AArch64 instrumentation Shared Counter 51

31 DynamoRIO + ArmIE SVE Code intrinsics must be added to the applications to enable instrumentation These define the region-of-interest to evaluate DynamoRIO ArmIE SVE binary Shared Counter #define START_TRACE() { } #define STOP_TRACE() { } Full compatibility with single-threaded apps Multi-thread support currently under work SVE instrumentation AArch64 instrumentation 52

32 DynamoRIO + ARMIE for SVE Tracing Run ARMIE within DR Introduces noise to the instrumentation Avoid tracing ARMIE s memory accesses ARMIE library memory space is different from the application s memory space DR will only generate traces inside the application s memory region Memory mapped shared counter Atomically accessed by ARMIE and DR Post-process script merges traces based on the sequence number freedom of do your own analysis! ARM Executable LDR STR ADD SUB SVE LOAD SVE ADD LDR SVE STR DynamoRIO ARMIE Non-SVE 0x8000 0x x8004 0x x801C 0x x8000 0x x8004 0x x8014 0x x8014 0x x801C 0x x8020 0x x8020 0x40008 S SVE 0x8014 0x x8014 0x x8020 0x x8020 0x40008 S Merged trace Shared Counter 53

33 SVE Ecosystem

34 SVE Programming How to get started Computational Scientists, Domain Experts Assembly (not suggested) Full ISA Specification: The Scalable Vector Extension for ARMv8-A Lots of worked examples, see A sneak peek into SVE and VLA programming Intrinsics (only if truly needed) Arm C Language Extensions for SVE Arm Scalable Vector Extensions and application to Machine Learning Auto-vectorization via GCC, Arm Compiler for HPC, Cray, Fujitsu Compiler Hints to the compiler via OpenMP Best practice writing code #pragma omp parallel for simd 55

35 SVE Tools How to get started Performance Specialists, Computer Architects Arm Compiler Arm Instruction Emulator Arm v8-a Binary Arm v8-a Binary with new features Arm Instruction Emulator Linux Arm v8-a 56 Research Enablement Kit

36 Commercial C/C++/Fortran compiler with best-in-class performance Compilers tuned for Scientific Computing and HPC Latest features and performance optimizations Commercially supported by Arm Tuned for Scientific Computing, HPC and Enterprise workloads Processor-specific optimizations for various server-class Arm-based platforms Optimal shared-memory parallelism using latest Arm-optimized OpenMP runtime Linux user-space compiler with latest features C++ 14 and Fortran 2003 language support with OpenMP 4.5* Support for Armv8-A and SVE architecture extension Based on LLVM and Flang, leading open-source compiler projects Commercially supported by Arm Available for a wide range of Arm-based platforms running leading Linux distributions RedHat, SUSE and Ubuntu 57

37 Arm Compiler Building on LLVM, Clang and Flang projects Arm C/C++/Fortran Compiler C/C++ Files (.c/.cpp) C/C++ Frontend Clang based Optimizer LLVM based ARMv8-A code-gen LLVM based ARMv8-A binary Fortran Files (.f/.f90) PGI Flang based Fortran Frontend LLVM IR IR Optimizations Auto-vectorization Enhanced optimization for ARMv8-A and SVE LLVM IR SVE code-gen LLVM based SVE binary Language specific frontend Language agnostic optimization Architecture specific backend 58

38 Status of GCC for SVE Making progress towards the aim of getting SVE support into GCC 8 GCC support was developed by a team in Arm and released publicly in Nov 2016 This year Linaro have been updating and incorporating those changes into upstream GCC Nearly 60% of the patches have been committed and a further 30% have been accepted but can t be committed yet Aim is to get the rest into GCC 8, due for release in Q Source: Arm HPC Workshop Tokyo 2017 by Linaro

39 Features of GCC for SVE Summary of features and scope of changes Vector-length agnostic and vector-length specific code generation Auto-vectorisation, including: Fully-predicated loops Predicated structure loads and stores Gather loads & scatter stores Aliasing checks with variable strides Non-associative reductions (FADDA) No support for SVE intrinsics yet: aim is to add that to GCC 9 61 Source: Arm HPC Workshop Tokyo 2017 by Linaro

40 Arm Instruction Emulator Develop your user-space applications for future hardware today Run Linux user-space code that uses new hardware features (SVE) on current Arm hardware Simple black box command line tool $ armclang hello.c --march=armv8+sve $./a.out Illegal instruction $ armie msve-vector-bits=256./a.out Hello Arm v8-a Binary Arm v8-a Binary with new features Arm Instruction Emulator Converts unsupported instructions to native Armv8-A instructions Linux Arm v8-a 62

41 Evaluating SVE Compile Emulate Analyse Arm Compiler C/C++/Fortran code SVE via auto-vectorization, intrinsics and assembly. Compiler Insight: Compiler places results of compiletime decisions and analysis in the resulting binary. Arm Instruction Emulator Runs userspace binaries for future Arm architectures on today s systems. Supported instructions run unmodified. Unsupported instructions are trapped and emulated. Arm Code Advisor Console or web-based output shows prioritized advice in-line with original source code. 63

42 Arm Code Advisor Combines static and dynamic information to produce actionable insights Performance Advice Compiler vectorization hints Compilation flags advice OpenMP instrumentation Insight from compilation and runtime Compiler insights are embedded into the application binary Instrumented OpenMP runtime traces OpenMP events Extensible Architecture User can write plugins to add their own analysis information Data accessible via web browser or command line 64

43 Arm Code Advisor Typical workflow 65

44 Arm Code Advisor Screenshot 66

45 Exploring SVE for scientific applications Objective: understand if apps cat benefit from SVE, assess quality and readiness of tools Various Arm-based SoC (Huawei Taishan) Several applications of interest: QE, KKRnano, GRID, BQCD Results on MiniKKR show Arm-based SoC (no SVE) similar performance versus x86 Estimate performance using instruction/branch counting (dynamic) and critical path analysis (static) Early Experience with ARM SVE, presented at SC 17 Arm SVE User Meeting by D. Pleiter (JSC) Exploring SVE for scientific applications, presented at HiPEAC 18 by S. Nassyr (JSC) 67

46 Full Application Enablement

47 Arm Allinea Studio A quick glance at what is in Arm Allinea Studio (latest 18.3) C/C++ Compiler Fortran Compiler Performance Libraries Forge (DDT and MAP) Performance Reports C++ 14 support OpenMP 4.5 without offloading SVE ready Fortran 2003 support Partial Fortran 2008 support OpenMP 3.1 SVE ready Optimized math libraries BLAS, LAPACK and FFT Threaded parallelism with OpenMP Profile, Tune and Debug Scalable debugging with DDT Parallel Profiling with MAP Analyze your application Memory, MPI, Threads, I/O, CPU metrics Tuned by Arm for a wide-range of server-class Arm-based platforms 69

48 Optimized BLAS, LAPACK and FFT Commercially supported by Arm Best serial and parallel performance Commercial 64-bit Armv8-A math libraries Commonly used low-level math routines - BLAS, LAPACK and FFT FFTW compatible interface for FFT routines Batch BLAS support Best serial and parallel performance Generic Armv8-A optimizations by Arm Tuning for specific platforms like Cavium ThunderX2 in collaboration with silicon vendors Validated and supported by Arm Engineers Available for a wide range of server-class Arm-based platforms Validated with NAG s test suite, a de-facto standard Validated with NAG test suite 70

49 18.2 release vs FFTW FFT timings on TX2 1-d complex-complex double precision Higher means new Arm Performance Libraries FFT implementation better than FFTW Results show: speed-up over FFTW 18.1 GROMACS LAMMPS NAMD Quantum ESPRESSO CASTEP 18.2 speeds up over FFTW Average about 30% faster than FFTW Lots of this comes from better usage of NEON vectors on TX2 Cases where we are still slower are: Very small (no major issue) Powers of primes better FFTW better

50 Arm Performance Library 18.3 release $ armclang code_with_math_routines.c \ -L${ARMPL_LIBRARIES} lamath $ armflang code_with_math_routines.f \ -L${ARMPL_LIBRARIES} -lamath 72

51 Arm Forge An interoperable toolkit for debugging and profiling Commercially supported by Arm Fully Scalable The de-facto standard for HPC development Available on the vast majority of the Top500 machines in the world Fully supported by Arm on x86, IBM Power, Nvidia GPUs, etc. State-of-the art debugging and profiling capabilities Powerful and in-depth error detection mechanisms (including memory debugging) Sampling-based profiler to identify and understand bottlenecks Available at any scale (from serial to petaflopic applications) Easy to use by everyone Unique capabilities to simplify remote interactive sessions Innovative approach to present quintessential information to users Very user-friendly 73

52 Software Ecosystem HPC Applications Porting GROMACS NAMD LAMMPS CESM MrBayes Bowtie AMBER Paraview SIESTA VMD WRF Quantum ESPRESSO CP2k MILC GEANT4 OpenFOAM GAMESS VisIT DL_POLY NEMO BLAST NWCHEM Abinit BWA QMCPACK Build recipes online at 74

53 Arm HPC Ecosystem website House for Arm s HPC ecosystem, information channels, and collaboration Latest events, news, blogs, and collateral including webinars, whitepapers and presentations Links to HPC open-source & commercial software packages Recipes for porting HPC apps Curated and moderated by Arm New Arm HPC User Group Forum 75

54 Wiki Dynamic list of common HPC applications Provides focus for porting progress Community driven. Maintained by Arm, but anyone can join and contribute. Allows developers to share recipes, and learn from progress on other applications Provides a mechanism for tracking status of applications and package sets (e.g. OpenHPC packages, Mantevo, etc.) Up-to-date summary of package status 76

55 17-19 September 2018 Robinson College, Cambridge, UK Architecture & Memory Biotechnology Flexible Electronics High Performance Computing Internet of Things Large Scale Systems Machine Learning Security & Servers Agenda available online, early bird expires July 31 st! arm.com/summit

56 Thank You! Danke! Merci!!! Gracias! Kiitos! 78 Thanks to Eric Van Hensbergen, Chris Goodyer, Miguel Tairum- Cruz, Alex Rico, Jose Joao, Giacomo Gabrielli, Olly Perks

57 The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. 79

Arm's role in co-design for the next generation of HPC platforms

Arm's role in co-design for the next generation of HPC platforms Arm's role in co-design for the next generation of HPC platforms Filippo Spiga Software and Large Scale Systems What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization

More information

Arm crossplatform. VI-HPS platform October 16, Arm Limited

Arm crossplatform. VI-HPS platform October 16, Arm Limited Arm crossplatform tools VI-HPS platform October 16, 2018 An introduction to Arm Arm is the world's leading semiconductor intellectual property supplier We license to over 350 partners: present in 95% of

More information

Arm in High Performance Computing: Fortran on AArch64

Arm in High Performance Computing: Fortran on AArch64 Arm in High Performance Computing: Fortran on AArch64 Nathan Sircombe Arm Manchester nathan.sircombe@arm.com 70% of the world s population uses Arm technology 2 Total computing experience Consumer Arm

More information

Software Ecosystem for Arm-based HPC

Software Ecosystem for Arm-based HPC Software Ecosystem for Arm-based HPC CUG 2018 - Stockholm Florent.Lebeau@arm.com Ecosystem for HPC List of components needed: Linux OS availability Compilers Libraries Job schedulers Debuggers Profilers

More information

ARM High Performance Computing

ARM High Performance Computing ARM High Performance Computing Eric Van Hensbergen Distinguished Engineer, Director HPC Software & Large Scale Systems Research IDC HPC Users Group Meeting Austin, TX September 8, 2016 ARM 2016 An introduction

More information

Arm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited

Arm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited Arm in HPC Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm 2019 Arm Limited Arm Technology Connects the World Arm in IOT 21 billion chips in the past year Mobile/Embedded/IoT/ Automotive/GPUs/Servers

More information

ARM Performance Libraries Current and future interests

ARM Performance Libraries Current and future interests ARM Performance Libraries Current and future interests Chris Goodyer Senior Engineering Manager, HPC Software Workshop on Batched, Reproducible, and Reduced Precision BLAS 25 th February 2017 ARM Performance

More information

The Arm Technology Ecosystem: Current Products and Future Outlook

The Arm Technology Ecosystem: Current Products and Future Outlook The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed

More information

Beyond Hardware IP An overview of Arm development solutions

Beyond Hardware IP An overview of Arm development solutions Beyond Hardware IP An overview of Arm development solutions 2018 Arm Limited Arm Technical Symposia 2018 Advanced first design cost (US$ million) IC design complexity and cost aren t slowing down 542.2

More information

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014

Profiling and Debugging OpenCL Applications with ARM Development Tools. October 2014 Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline

More information

Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths

Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee and M. Sato RIKEN AICS (Advanced Institute for Computational

More information

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing Innovative Alternate Architecture for Exascale Computing Surya Hotha Director, Product Marketing Cavium Corporate Overview Enterprise Mobile Infrastructure Data Center and Cloud Service Provider Cloud

More information

The Mont-Blanc project Updates from the Barcelona Supercomputing Center

The Mont-Blanc project Updates from the Barcelona Supercomputing Center montblanc-project.eu @MontBlanc_EU The Mont-Blanc project Updates from the Barcelona Supercomputing Center Filippo Mantovani This project has received funding from the European Union's Horizon 2020 research

More information

Toward Building up ARM HPC Ecosystem

Toward Building up ARM HPC Ecosystem Toward Building up ARM HPC Ecosystem Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Sept. 12 th, 2017 0 Outline Fujitsu s Super computer development history and Post-K

More information

A sneak peek into SVE and VLA programming

A sneak peek into SVE and VLA programming A sneak peek into SVE and VLA programming Francesco Petrogalli September 2018 Page 1 of 19 Contents Contents 2 Introduction 2 1 SVE: an overview 3 New architectural registers...............................................

More information

ARM instruction sets and CPUs for wide-ranging applications

ARM instruction sets and CPUs for wide-ranging applications ARM instruction sets and CPUs for wide-ranging applications Chris Turner Director, CPU technology marketing ARM Tech Forum Taipei July 4 th 2017 ARM computing is everywhere #1 shipping GPU in the world

More information

ARMv8-A Scalable Vector Extension for Post-K. Copyright 2016 FUJITSU LIMITED

ARMv8-A Scalable Vector Extension for Post-K. Copyright 2016 FUJITSU LIMITED ARMv8-A Scalable Vector Extension for Post-K 0 Post-K Supports New SIMD Extension The SIMD extension is a 512-bit wide implementation of SVE SVE is an HPC-focused SIMD instruction extension in AArch64

More information

Webinar Tips and Tricks for Porting HPC Apps

Webinar Tips and Tricks for Porting HPC Apps Webinar Tips and Tricks for Porting HPC Apps 2018 Arm Limited John C. Linford 6 December 2018 Arm is changing the game in HPC Sandia s Astra supercomputer is the first Arm-based

More information

SUSE Linux Entreprise Server for ARM

SUSE Linux Entreprise Server for ARM FUT89013 SUSE Linux Entreprise Server for ARM Trends and Roadmap Jay Kruemcke Product Manager jayk@suse.com @mr_sles ARM Overview ARM is a Reduced Instruction Set (RISC) processor family British company,

More information

Post-K: Building the Arm HPC Ecosystem

Post-K: Building the Arm HPC Ecosystem Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach

More information

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs

Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem

More information

Bootstrapping a HPC Ecosystem

Bootstrapping a HPC Ecosystem Bootstrapping a HPC Ecosystem Eric Van Hensbergen Fellow Senior Director of HPC Software and Large Scale Systems Research Teratech Forum June 19, 2018 Copyright ARM computing is everywhere #1 shipping

More information

Enabling the ARM high performance computing (HPC) software ecosystem

Enabling the ARM high performance computing (HPC) software ecosystem Enabling the ARM high performance computing (HPC) software ecosystem Ashok Bhat Product manager, HPC and Server tools ARM Tech Symposia India December 7th 2016 Are these supercomputers? For example, the

More information

GOING ARM A CODE PERSPECTIVE

GOING ARM A CODE PERSPECTIVE GOING ARM A CODE PERSPECTIVE ISC18 Guillaume Colin de Verdière JUNE 2018 GCdV PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France June 2018 A history of disruptions All dates are installation dates of the machines

More information

The Mont-Blanc approach towards Exascale

The Mont-Blanc approach towards Exascale http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are

More information

Arm Processor Technology Update and Roadmap

Arm Processor Technology Update and Roadmap Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture

More information

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications

Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Helena Zheng ML Group, Arm Arm Technical Symposia 2017, Taipei Machine Learning is a Subset of Artificial

More information

ARM processors driving automotive innovation

ARM processors driving automotive innovation ARM processors driving automotive innovation Chris Turner Director of advanced technology marketing, CPU group ARM tech forums, Seoul and Taipei June/July 2016 The ultimate intelligent connected device

More information

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities--

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 28 th, 2018 0 Copyright 2018 FUJITSU LIMITED Outline of

More information

Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard

Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard Prof Simon McIntosh-Smith Isambard PI University of Bristol / GW4 Alliance Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard Isambard system specification 10,000+

More information

Each Milliwatt Matters

Each Milliwatt Matters Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets

More information

Modern Processor Architectures. L25: Modern Compiler Design

Modern Processor Architectures. L25: Modern Compiler Design Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions

More information

The Bifrost GPU architecture and the ARM Mali-G71 GPU

The Bifrost GPU architecture and the ARM Mali-G71 GPU The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our

More information

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer

Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2

More information

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design

Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant

More information

AMD S X86 OPEN64 COMPILER. Michael Lai AMD

AMD S X86 OPEN64 COMPILER. Michael Lai AMD AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries

More information

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016

AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016 AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP

More information

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017

Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University   Scalable Tools Workshop 7 August 2017 Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &

More information

The challenge of SVE in QEMU. Alex Bennée Senior Virtualization Engineer

The challenge of SVE in QEMU. Alex Bennée Senior Virtualization Engineer The challenge of SVE in QEMU Alex Bennée Senior Virtualization Engineer alex.bennee@linaro.org What is ARM s SVE? Why implement SVE in QEMU? How does QEMU s TCG work? Challenges for QEMU s emulation Current

More information

S Comparing OpenACC 2.5 and OpenMP 4.5

S Comparing OpenACC 2.5 and OpenMP 4.5 April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical

More information

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley

Tools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley Tools and Methodology for Ensuring HPC Programs Correctness and Performance Beau Paisley bpaisley@allinea.com About Allinea Over 15 years of business focused on parallel programming development tools Strong

More information

Bringing Intelligence to Enterprise Storage Drives

Bringing Intelligence to Enterprise Storage Drives Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely

More information

Transforming the Data Center with ARM

Transforming the Data Center with ARM WHITE PAPER Transforming the Data Center with ARM Maximizing Energy, Scalability, and Performance in the Modern Data Center IT leaders looking to modernize their computer, networking and storage systems

More information

HPC Network Stack on ARM

HPC Network Stack on ARM HPC Network Stack on ARM Pavel Shamis (Pasha) Principal Research Engineer ARM Research ExaComm 2017 06/22/2017 HPC network stack on ARM? 2 Serious ARM HPC deployments starting in 2017 ARM Emerging CPU

More information

Building supercomputers from embedded technologies

Building supercomputers from embedded technologies http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results

More information

OpenMP 4.0: A Significant Paradigm Shift in Parallelism

OpenMP 4.0: A Significant Paradigm Shift in Parallelism OpenMP 4.0: A Significant Paradigm Shift in Parallelism Michael Wong OpenMP CEO michaelw@ca.ibm.com http://bit.ly/sc13-eval SC13 OpenMP 4.0 released 2 Agenda The OpenMP ARB History of OpenMP OpenMP 4.0

More information

Bifrost - The GPU architecture for next five billion

Bifrost - The GPU architecture for next five billion Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor

More information

SIMD Exploitation in (JIT) Compilers

SIMD Exploitation in (JIT) Compilers SIMD Exploitation in (JIT) Compilers Hiroshi Inoue, IBM Research - Tokyo 1 What s SIMD? Single Instruction Multiple Data Same operations applied for multiple elements in a vector register input 1 A0 input

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

Accelerating Insights In the Technical Computing Transformation

Accelerating Insights In the Technical Computing Transformation Accelerating Insights In the Technical Computing Transformation Dr. Rajeeb Hazra Vice President, Data Center Group General Manager, Technical Computing Group June 2014 TOP500 Highlights Intel Xeon Phi

More information

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED Post-K Supercomputer Overview 1 Post-K supercomputer overview Developing Post-K as the successor to the K computer with RIKEN Developing HPC-optimized high performance CPU and system software Selected

More information

Industrial-Strength High-Performance RISC-V Processors for Energy-Efficient Computing

Industrial-Strength High-Performance RISC-V Processors for Energy-Efficient Computing Industrial-Strength High-Performance RISC-V Processors for Energy-Efficient Computing Dave Ditzel dave@esperanto.ai President and CEO Esperanto Technologies, Inc. 7 th RISC-V Workshop November 28, 2017

More information

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit

More information

European energy efficient supercomputer project

European energy efficient supercomputer project http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All

More information

OmpCloud: Bridging the Gap between OpenMP and Cloud Computing

OmpCloud: Bridging the Gap between OpenMP and Cloud Computing OmpCloud: Bridging the Gap between OpenMP and Cloud Computing Hervé Yviquel, Marcio Pereira and Guido Araújo University of Campinas (UNICAMP), Brazil A bit of background qguido Araujo, PhD Princeton University

More information

April 2 nd, Bob Burroughs Director, HPC Solution Sales

April 2 nd, Bob Burroughs Director, HPC Solution Sales April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software

More information

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews

Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,

More information

Bringing Intelligence to Enterprise Storage Drives

Bringing Intelligence to Enterprise Storage Drives Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely

More information

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems

Designing, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software

More information

WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Arm Limited

WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Arm Limited WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Artificial Intelligence Fifth wave Data-driven computing era IoT Generating data 5G 5G Transporting

More information

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Update of Post-K Development Yutaka Ishikawa RIKEN AICS Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing

More information

The Benefits of GPU Compute on ARM Mali GPUs

The Benefits of GPU Compute on ARM Mali GPUs The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >

More information

Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge

Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge Ryan Hulguin Applications Engineer ryan.hulguin@arm.com Agenda Introduction Overview of Allinea Products

More information

New Approaches to Connected Device Security

New Approaches to Connected Device Security New Approaches to Connected Device Security Erik Jacobson Architecture Marketing Director Arm Arm Techcon 2017 - If you connect it to the Internet, someone will try to hack it. - If what you put on the

More information

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?

Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA

More information

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level

CLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level CLICK TO EDIT MASTER TITLE STYLE Second level THE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION

More information

ARM TrustZone for ARMv8-M for software engineers

ARM TrustZone for ARMv8-M for software engineers ARM TrustZone for ARMv8-M for software engineers Ashok Bhat Product Manager, HPC and Server tools ARM Tech Symposia India December 7th 2016 The need for security Communication protection Cryptography,

More information

The Next Steps in the Evolution of ARM Cortex-M

The Next Steps in the Evolution of ARM Cortex-M The Next Steps in the Evolution of ARM Cortex-M Joseph Yiu Senior Embedded Technology Manager CPU Group ARM Tech Symposia China 2015 November 2015 Trust & Device Integrity from Sensor to Server 2 ARM 2015

More information

Pedraforca: a First ARM + GPU Cluster for HPC

Pedraforca: a First ARM + GPU Cluster for HPC www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu

More information

ARM mbed: Internet of Possible

ARM mbed: Internet of Possible ARM mbed: Internet of Possible Bill Woo Director ISG Sales El Tower / 2017 Tech Forum June 28, 2017 Introduction Today enterprises are under pressure to unlock the value in the Internet of Things. Our

More information

The Changing Face of Edge Compute

The Changing Face of Edge Compute The Changing Face of Edge Compute 2018 Arm Limited Alvin Yang Nov 2018 Market trends acceleration of technology deployment 26 years 4 years 100 billion chips shipped 100 billion chips shipped 1 Trillion

More information

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS

SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous

More information

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7

Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design

More information

Barcelona Supercomputing Center

Barcelona Supercomputing Center www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:

More information

The Mont-Blanc Project

The Mont-Blanc Project http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding

More information

Trends in HPC (hardware complexity and software challenges)

Trends in HPC (hardware complexity and software challenges) Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18

More information

The Cray Programming Environment. An Introduction

The Cray Programming Environment. An Introduction The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent

More information

RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures

RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures Presentation Brief RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures Delivers Independent Resource Scaling, Open Source, and Modular Chip Design for Big Data and Fast Data Environments

More information

ARMv8-A Next-Generation Vector Architecture for HPC

ARMv8-A Next-Generation Vector Architecture for HPC ARMv8-A Next-Generation Vector Architecture for HPC Nigel Stephens Lead ISA Architect and ARM Fellow Hot Chips 28, Cupertino August 22, 2016 Expanding ARMv8 vector processing ARMv7 Advanced SIMD (aka ARM

More information

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE

HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)

More information

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools

Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid

More information

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title

Programming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title Programming for the Intel Many Integrated Core Architecture By James Reinders The Architecture for Discovery PowerPoint Title Intel Xeon Phi coprocessor 1. Designed for Highly Parallel workloads 2. and

More information

What is gem5 and where do I get it?

What is gem5 and where do I get it? What is gem5 and where do I get it? Andreas Sandberg & Nikos Nikoleris ARM Research Why gem5? Runs real workloads Runs complex workloads like Android & ChromeOS System-level insights Device interactions

More information

2017 Arm Limited. How to design an IoT SoC and get Arm CPU IP for no upfront license fee

2017 Arm Limited. How to design an IoT SoC and get Arm CPU IP for no upfront license fee 2017 Arm Limited How to design an IoT SoC and get Arm CPU IP for no upfront license fee An enhanced Arm DesignStart Building on a strong foundation Successfully used by 1000s of designers, researchers

More information

ARMv8-A CPU Architecture Overview

ARMv8-A CPU Architecture Overview ARMv8-A CPU Architecture Overview Chris Shore Training Manager, ARM ARM Game Developer Day, London 03/12/2015 Chris Shore ARM Training Manager With ARM for 16 years Managing customer training for 15 years

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC and hyperscale customers deploy accelerated

More information

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving

Cortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology

More information

Scientific Programming in C XIV. Parallel programming

Scientific Programming in C XIV. Parallel programming Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence

More information

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING

TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC enterprise and hyperscale customers deploy

More information

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences

More information

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further

More information

ARMv8-A Software Development

ARMv8-A Software Development ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for

More information

Industry Collaboration and Innovation

Industry Collaboration and Innovation Industry Collaboration and Innovation Industry Landscape Key changes occurring in our industry Historical microprocessor technology continues to deliver far less than the historical rate of cost/performance

More information

OpenACC Course. Office Hour #2 Q&A

OpenACC Course. Office Hour #2 Q&A OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

FUJITSU HPC and the Development of the Post-K Supercomputer

FUJITSU HPC and the Development of the Post-K Supercomputer FUJITSU HPC and the Development of the Post-K Supercomputer Toshiyuki Shimizu Vice President, System Development Division, Next Generation Technical Computing Unit 0 November 16 th, 2016 Post-K is currently

More information

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart ECMWF Workshop on High Performance Computing in Meteorology 3 rd November 2010 Dean Stewart Agenda Company Overview Rogue Wave Product Overview IMSL Fortran TotalView Debugger Acumem ThreadSpotter 1 Copyright

More information

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing

Expanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing Expanding Opportunities in Clamshell Devices Laurence Bryant VP Strategic Marketing 1 PC Mobile Ecosystem Scaling The Richness Of Small Screen Experiences The smartphone and tablet ecosystem is shaping

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

A Fast Instruction Set Simulator for RISC-V

A Fast Instruction Set Simulator for RISC-V A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.

More information