Arm's role in co-design for the next generation of HPC platforms
|
|
- Harry Ross
- 5 years ago
- Views:
Transcription
1 Arm's role in co-design for the next generation of HPC platforms Arm HPC workshop at SJTU July 2018 Filippo Spiga Software and Large Scale Systems
2 Outline Arm and High Performance Computing Introducing SVE Simulation Tools: gem5 and DynamoRIO+ARMIE SVE Ecosystem Full Application Enablement 2
3 Arm Technology Already Connects the World Arm is Ubiquitous 21 billion chips sold by partners in 2017 alone Mobile/Embedded/IoT/ Automotive/Server/GPUs Partnership is key We design IP, not manufacture chips Partners build products for their target markets Arm is Ubiquitous One size is not always the best fit for all HPC is a great fit for codesign and collaboration 4
4 The Arm Business Model Deploy energy-efficient Arm-based technology, wherever computing happens Leading in wearables and the Internet of Things ~85% share of laptops, tablets, and smartphones Driving the transformation of the network and data center to an Intelligent Flexible Cloud Enabling innovation and creativity with embedded intelligence Taking mobile computing to the next four billion people Partnering to deliver data center efficiency 6
5 Arm Cortex Family Cortex - A Highest performance Optimised for rich operating systems Cortex - R Fast response Optimised for high performance, hard real-time applications Cortex - M Smallest/lowest power Optimised for embedded intelligence and microcontrollers 8
6 Arm application processors are everywhere Cortex-A CPUs cover a wide variety of markets Scale efficiently to substantially higher performance Fit even more compute in a smaller footprint with less power Mobile and Consumer Automotive Servers and networking IoT and Embedded 9
7 Introducing Arm Research Mission Partner to accelerate innovation and transfer research knowledge across Arm and the Arm Ecosystem Objectives Build a pipeline to create and bring future technology into the Arm Ecosystem Create and maintain the emerging technology landscape Enable innovative Academic research through collaboration and partnership Strategic Research Agendas Research Programs Technology Landscapes Collaboration Programs 10
8 Arm Research to End User Research ARM IP SOC OEM Service 5-10 years lead time 12
9 Arm Research Groups Architectures Systems and Memory Design Methodology Software and Large Scale Systems New Materials, Devices and Circuits IoT and Energy-Constrained Systems (+ Security) 14
10 Arm High Performance Computing strategy Enablement Address gaps in computational capability and data movement within Architecture Seed the software ecosystem with open source support for Armv8 and SVE libraries, tools, and optimized workloads Provide world class tools for compilation, analysis, and debug at large scale. Mission Enable the world s first Arm supercomputer(s) Strategy Enablement + Co-Design + Partnership Strategy: Enablement + Co-Design + Partnership Building Blocks Co-Design Work with key end-customers in DoE, DoD, RIKEN, and EU to design balanced architecture, uarchitecture and SoCs based on real-world workloads, not benchmarks. Develop simulation and modelling tools to support co-design development with end-customers, partners, and academia. Partnership Work with Architecture partners to quickly bring optimized solutions to market. Work with ATG & uarchitecture design teams to steer future designs to be more relevant for HPC, HPDA, and ML Work with key ISVs to enable mid-market 17
11 What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization that future computing environments will be significantly different from those that provide Petascale capabilities. This change is driven by energy constraints, which is compelling architects to design systems that will require a significant re-thinking of how algorithms are developed and implemented. Co-design has been proposed as a methodology for scientific application, software and hardware communities to work together. This chapter gives an overview of co-design and discusses the early application of this methodology to HighPerformance Computing. [ ] The co-design strategy is based on developing partnerships with computer vendors and application scientists and engaging them in a highly collaborative and iterative design process well before a given system is available for commercial use. The process is built around identifying leading edge, high-impact scientific applications and providing concrete optimization targets rather than focusing on speeds andfeeds (FLOPs andbandwidth) andpercent ofpeak. On the Role of Co-design in High Performance Computing, Transition of HPC Towards Exascale Computing (2013) 18
12 Co-design with Arm Analysis of applications to devise the most efficient solutions Applications System Software & Tools Platform Hardware (the implementation ) Architecture Opportunities to exploit 19
13 History of Arm in HPC 2011 Calxada 32-bit ARMv7-A Cortex A AMD Opteron A bit ARMv8-A Cortex A Cores 2017 Cavium ThunderX 2 64-bit ARMv8-A 32 Cores Mont-Blanc 1 32-bit ARMv7-A Cortex A15 First Arm HPC cluster 2015 Cavium ThunderX 64-bit ARMv8-A 48 Cores 20
14 Introducing SVE
15 Performance Portability 1 st source of frustration for Applications Developers Start thinking massively parallel easy to say, hard in practice! Exploit massive parallelism without over-specifying the details Programmer may be able to provide hints to a compiler/runtime but how good in self-discover and self-tuning? Exploit heterogeneous compute elements is hard MPI+X still good enough? Will we ever move beyond it? One programming language to rule them all? Handling efficient data sharing across components The CPU has everything we need, can we (Arm) make CPU architecture better? 30
16 Evolution of Arm SIMD architectures Single Instruction Multiple Data = data level parallelism, many operations per instruction Armv6 SIMD bit integer/core register file Integer only 2x16-bit fixed-point data elements Armv7-A Advanced SIMD (aka Arm NEON) bit SIMD register file, supporting well-conditioned memory data layouts Non-IEEE single-precision floating-point and 8/16/32-bit fixed-point data elements Armv8-A AArch64 Advanced SIMD was an evolutionary step b SIMD register file, identical memory data layouts Full IEEE double-precision floating-point and 64-bit fixed-point data elements But new markets for AArch64 (HPC) have demanded more radical features Ability to vectorize irregular code and more complex data structures Longer vectors to extract more data-level parallelism per cycle? 31
17 Scalable Vector Extension (SVE) SVE does not mandate a single, fixed vector length Vector Length (VL) is hardware implementation choice of 128 to 2048 bits Vector Length Agnostic (VLA) programming paradigm made possible by the per-lane predication, predicate-driven loop control, vector partitioning and software-managed speculation SVE is not a simple extension of AArch64 Advanced SIMD A separate, optional architectural extension with a new set of instruction encodings (ARMv8.3) Initial focus is HPC and general-purpose server software, not media/image processing SVE begins to address traditional barriers to auto-vectorization Compilers often cannot vectorize due to intra-vector control and data dependencies Software-managed speculative vectorization allows more loops to be vectorized by a compiler 32
18 Introducing the Scalable Vector Extension (SVE) A vector extension to the ARMv8-A architecture with some major new features: Gather-load and scatter-store Loads a single register from several non-contiguous memory locations. + pred = Per-lane predication Operations work on individual lanes under control of a predicate register. for (i = 0; i < n; ++i) INDEX i n-2 n-1 n n+1 CMPLT n Predicate-driven loop control and management Eliminate scalar loop heads and tails by processing partial vectors. 33
19 Introducing the Scalable Vector Extension (SVE) A vector extension to the ARMv8-A architecture with some major new features: + pred Vector partitioning and software-managed speculation First Faulting Load instructions allow memory accesses to cross into invalid pages = = = = Extended floating-point horizontal reductions In-order and tree-based reductions trade-off performance and repeatability. 34
20 Example:daxpy (scalar) void daxpy(double *x, double *y, double a, int n) { for (int i = 0; i < n; i++) { y[i] = a * x[i] + y[i]; } } // x0 = &x[0] // x1 = &y[0] // x2 = &a // x3 = &n daxpy_: ldrsw x3, [x3] mov x4, #0 ldr d0, [x2] b.latch.loop: ldr d1, [x0, x4, lsl #3] ldr d2, [x1, x4, lsl #3] fmadd d2, d1, d0, d2 str d2, [x1, x4, lsl #3] add x4, x4, #1.latch: cmp x4, x3 b.lt.loop ret 36
21 daxpy (SVE) daxpy (scalar) daxpy_:.loop:.latch: ldrsw x3, [x3] mov x4, #0 whilelt p0.d, x4, x3 ld1rd z0.d, p0/z, [x2] ld1d z1.d, p0/z, [x0, x4, lsl #3] ld1d z2.d, p0/z, [x1, x4, lsl #3] fmla z2.d, p0/m, z1.d, z0.d st1d z2.d, p0, [x1, x4, lsl #3] incd x4 whilelt b.first ret p0.d, x4, x3.loop daxpy_:.loop:.latch: ldrsw x3, [x3] mov x4, #0 ldr d0, [x2] b.latch ldr d1, [x0, x4, lsl #3] ldr d2, [x1, x4, lsl #3] fmadd d2, d1, d0, d2 str d2, [x1, x4, lsl #3] add x4, x4, #1 cmp b.lt ret x4, x3.loop Q1: How do we handle the non-multiples of VL? Q2: What happens at different vector lengths? check bonus slides 37
22 Vector Architecture Design Trade-offs SLC SLC MC MC Example of Cache Coherent Vector Microarchitecture Cores with one or more SVE and Load/Store (LS) unit(s) NoC Private L1 and L2 caches (considered part of the core) System-level cache (SLC) shared among cores L2 Memory controllers (MC) L1 Network on chip (NoC) interconnects cores, SLCs and MCs SVE LS Core Core Disclaimer: Logical representation, not representative of a physical implementation 38
23 Simulation Tools: gem5 and DynamoRIO+ARMIE
24 gem5 gem5: open-source sophisticated processor and computer system simulator Runs real workloads Analyze workloads that customers use and care about including complex workloads such as Android Comprehensive model library Memory and I/O devices Full OS, Web browsers Rapid early prototyping But not a microarchitectural model out of the box! Ubuntu (Linux 4.x) Android Nougat New ideas can be tested quickly System-level impact can be quantified System-level insights Enables us to study complex memory-system 44
25 When Not to Use gem5 Performance validation gem5 is not a cycle-accurate microarchitecture model for any specific product This typically requires more accurate models such as RTL simulation. Commercial products such as Arm CycleModels operate in this space. Core microarchitecture exploration Only do this if you have a custom, detailed, CPU model! gem5 s core models were not designed to replace more accurate microarchitectural models. To validate functional correctness or test bleeding-edge ISA improvements gem5 is not as rigorously tested as commercial products. New (ARMv8.0+) or optional instructions are sometimes not implemented Commercial products such as Arm FastModels offer better reliability in this space. 45
26 Arm Research Enablement Kit System Modeling Using gem5 System-on-chip (SoC) systems are complex and expensive to prototype Simulation is a cost-effective way to evaluate new ideas gem5 models various system components, such as different CPU models, interconnection topologies, memory subsystems, etc. gem5 also models the interactions among system components Arm Research Enablement Kit is a guide through Arm-based system modeling using the gem5 simulator and a 64-bit processor model based on Armv8-A. 46
27 SVE Support in gem5 Commitment from Arm Research to open-source full support for SVE in gem5, with the goal of enabling research around SVE (and vector architectures in general) Main contributors: Giacomo Gabrielli Rekai Gonzalez Alberquilla Javier Setoain Gabor Dosza Still technically a work-in-progress, but most of the functionality is in place! 47
28 Where Do I Get It From? > git clone > git checkout sve/beta1 sve/beta1 is the latest development branch the plan is to merge all changes onto master after external review process Check README_SVE.md for up-to-date list of known issues and limitations 48
29 DynamoRIO Open-source BSD project Support for AArch32, AArch64 (added by Arm Research), IA-32, and x86-64 Enable runtime code manipulation system that supports code transformations on any part of a program, while it executes. Provides utilities for memory address tracing Opportunity for custom instrumentations via clients 49
30 DynamoRIO + ArmIE SVE DynamoRIO does not offer (yet!) natively SVE support Arm Instruction Emulator (ArmIE) is used to execute and instrument any SVE instructions Memory mapped shared counter ensures output correctness for traces 3 Clients currently supported: Memory Tracing Instruction Tracing Instruction Count Individual traces can be merged and analyze via additional post-processing scripts DynamoRIO ArmIE SVE binary SVE instrumentation AArch64 instrumentation Shared Counter 51
31 DynamoRIO + ArmIE SVE Code intrinsics must be added to the applications to enable instrumentation These define the region-of-interest to evaluate DynamoRIO ArmIE SVE binary Shared Counter #define START_TRACE() { } #define STOP_TRACE() { } Full compatibility with single-threaded apps Multi-thread support currently under work SVE instrumentation AArch64 instrumentation 52
32 DynamoRIO + ARMIE for SVE Tracing Run ARMIE within DR Introduces noise to the instrumentation Avoid tracing ARMIE s memory accesses ARMIE library memory space is different from the application s memory space DR will only generate traces inside the application s memory region Memory mapped shared counter Atomically accessed by ARMIE and DR Post-process script merges traces based on the sequence number freedom of do your own analysis! ARM Executable LDR STR ADD SUB SVE LOAD SVE ADD LDR SVE STR DynamoRIO ARMIE Non-SVE 0x8000 0x x8004 0x x801C 0x x8000 0x x8004 0x x8014 0x x8014 0x x801C 0x x8020 0x x8020 0x40008 S SVE 0x8014 0x x8014 0x x8020 0x x8020 0x40008 S Merged trace Shared Counter 53
33 SVE Ecosystem
34 SVE Programming How to get started Computational Scientists, Domain Experts Assembly (not suggested) Full ISA Specification: The Scalable Vector Extension for ARMv8-A Lots of worked examples, see A sneak peek into SVE and VLA programming Intrinsics (only if truly needed) Arm C Language Extensions for SVE Arm Scalable Vector Extensions and application to Machine Learning Auto-vectorization via GCC, Arm Compiler for HPC, Cray, Fujitsu Compiler Hints to the compiler via OpenMP Best practice writing code #pragma omp parallel for simd 55
35 SVE Tools How to get started Performance Specialists, Computer Architects Arm Compiler Arm Instruction Emulator Arm v8-a Binary Arm v8-a Binary with new features Arm Instruction Emulator Linux Arm v8-a 56 Research Enablement Kit
36 Commercial C/C++/Fortran compiler with best-in-class performance Compilers tuned for Scientific Computing and HPC Latest features and performance optimizations Commercially supported by Arm Tuned for Scientific Computing, HPC and Enterprise workloads Processor-specific optimizations for various server-class Arm-based platforms Optimal shared-memory parallelism using latest Arm-optimized OpenMP runtime Linux user-space compiler with latest features C++ 14 and Fortran 2003 language support with OpenMP 4.5* Support for Armv8-A and SVE architecture extension Based on LLVM and Flang, leading open-source compiler projects Commercially supported by Arm Available for a wide range of Arm-based platforms running leading Linux distributions RedHat, SUSE and Ubuntu 57
37 Arm Compiler Building on LLVM, Clang and Flang projects Arm C/C++/Fortran Compiler C/C++ Files (.c/.cpp) C/C++ Frontend Clang based Optimizer LLVM based ARMv8-A code-gen LLVM based ARMv8-A binary Fortran Files (.f/.f90) PGI Flang based Fortran Frontend LLVM IR IR Optimizations Auto-vectorization Enhanced optimization for ARMv8-A and SVE LLVM IR SVE code-gen LLVM based SVE binary Language specific frontend Language agnostic optimization Architecture specific backend 58
38 Status of GCC for SVE Making progress towards the aim of getting SVE support into GCC 8 GCC support was developed by a team in Arm and released publicly in Nov 2016 This year Linaro have been updating and incorporating those changes into upstream GCC Nearly 60% of the patches have been committed and a further 30% have been accepted but can t be committed yet Aim is to get the rest into GCC 8, due for release in Q Source: Arm HPC Workshop Tokyo 2017 by Linaro
39 Features of GCC for SVE Summary of features and scope of changes Vector-length agnostic and vector-length specific code generation Auto-vectorisation, including: Fully-predicated loops Predicated structure loads and stores Gather loads & scatter stores Aliasing checks with variable strides Non-associative reductions (FADDA) No support for SVE intrinsics yet: aim is to add that to GCC 9 61 Source: Arm HPC Workshop Tokyo 2017 by Linaro
40 Arm Instruction Emulator Develop your user-space applications for future hardware today Run Linux user-space code that uses new hardware features (SVE) on current Arm hardware Simple black box command line tool $ armclang hello.c --march=armv8+sve $./a.out Illegal instruction $ armie msve-vector-bits=256./a.out Hello Arm v8-a Binary Arm v8-a Binary with new features Arm Instruction Emulator Converts unsupported instructions to native Armv8-A instructions Linux Arm v8-a 62
41 Evaluating SVE Compile Emulate Analyse Arm Compiler C/C++/Fortran code SVE via auto-vectorization, intrinsics and assembly. Compiler Insight: Compiler places results of compiletime decisions and analysis in the resulting binary. Arm Instruction Emulator Runs userspace binaries for future Arm architectures on today s systems. Supported instructions run unmodified. Unsupported instructions are trapped and emulated. Arm Code Advisor Console or web-based output shows prioritized advice in-line with original source code. 63
42 Arm Code Advisor Combines static and dynamic information to produce actionable insights Performance Advice Compiler vectorization hints Compilation flags advice OpenMP instrumentation Insight from compilation and runtime Compiler insights are embedded into the application binary Instrumented OpenMP runtime traces OpenMP events Extensible Architecture User can write plugins to add their own analysis information Data accessible via web browser or command line 64
43 Arm Code Advisor Typical workflow 65
44 Arm Code Advisor Screenshot 66
45 Exploring SVE for scientific applications Objective: understand if apps cat benefit from SVE, assess quality and readiness of tools Various Arm-based SoC (Huawei Taishan) Several applications of interest: QE, KKRnano, GRID, BQCD Results on MiniKKR show Arm-based SoC (no SVE) similar performance versus x86 Estimate performance using instruction/branch counting (dynamic) and critical path analysis (static) Early Experience with ARM SVE, presented at SC 17 Arm SVE User Meeting by D. Pleiter (JSC) Exploring SVE for scientific applications, presented at HiPEAC 18 by S. Nassyr (JSC) 67
46 Full Application Enablement
47 Arm Allinea Studio A quick glance at what is in Arm Allinea Studio (latest 18.3) C/C++ Compiler Fortran Compiler Performance Libraries Forge (DDT and MAP) Performance Reports C++ 14 support OpenMP 4.5 without offloading SVE ready Fortran 2003 support Partial Fortran 2008 support OpenMP 3.1 SVE ready Optimized math libraries BLAS, LAPACK and FFT Threaded parallelism with OpenMP Profile, Tune and Debug Scalable debugging with DDT Parallel Profiling with MAP Analyze your application Memory, MPI, Threads, I/O, CPU metrics Tuned by Arm for a wide-range of server-class Arm-based platforms 69
48 Optimized BLAS, LAPACK and FFT Commercially supported by Arm Best serial and parallel performance Commercial 64-bit Armv8-A math libraries Commonly used low-level math routines - BLAS, LAPACK and FFT FFTW compatible interface for FFT routines Batch BLAS support Best serial and parallel performance Generic Armv8-A optimizations by Arm Tuning for specific platforms like Cavium ThunderX2 in collaboration with silicon vendors Validated and supported by Arm Engineers Available for a wide range of server-class Arm-based platforms Validated with NAG s test suite, a de-facto standard Validated with NAG test suite 70
49 18.2 release vs FFTW FFT timings on TX2 1-d complex-complex double precision Higher means new Arm Performance Libraries FFT implementation better than FFTW Results show: speed-up over FFTW 18.1 GROMACS LAMMPS NAMD Quantum ESPRESSO CASTEP 18.2 speeds up over FFTW Average about 30% faster than FFTW Lots of this comes from better usage of NEON vectors on TX2 Cases where we are still slower are: Very small (no major issue) Powers of primes better FFTW better
50 Arm Performance Library 18.3 release $ armclang code_with_math_routines.c \ -L${ARMPL_LIBRARIES} lamath $ armflang code_with_math_routines.f \ -L${ARMPL_LIBRARIES} -lamath 72
51 Arm Forge An interoperable toolkit for debugging and profiling Commercially supported by Arm Fully Scalable The de-facto standard for HPC development Available on the vast majority of the Top500 machines in the world Fully supported by Arm on x86, IBM Power, Nvidia GPUs, etc. State-of-the art debugging and profiling capabilities Powerful and in-depth error detection mechanisms (including memory debugging) Sampling-based profiler to identify and understand bottlenecks Available at any scale (from serial to petaflopic applications) Easy to use by everyone Unique capabilities to simplify remote interactive sessions Innovative approach to present quintessential information to users Very user-friendly 73
52 Software Ecosystem HPC Applications Porting GROMACS NAMD LAMMPS CESM MrBayes Bowtie AMBER Paraview SIESTA VMD WRF Quantum ESPRESSO CP2k MILC GEANT4 OpenFOAM GAMESS VisIT DL_POLY NEMO BLAST NWCHEM Abinit BWA QMCPACK Build recipes online at 74
53 Arm HPC Ecosystem website House for Arm s HPC ecosystem, information channels, and collaboration Latest events, news, blogs, and collateral including webinars, whitepapers and presentations Links to HPC open-source & commercial software packages Recipes for porting HPC apps Curated and moderated by Arm New Arm HPC User Group Forum 75
54 Wiki Dynamic list of common HPC applications Provides focus for porting progress Community driven. Maintained by Arm, but anyone can join and contribute. Allows developers to share recipes, and learn from progress on other applications Provides a mechanism for tracking status of applications and package sets (e.g. OpenHPC packages, Mantevo, etc.) Up-to-date summary of package status 76
55 17-19 September 2018 Robinson College, Cambridge, UK Architecture & Memory Biotechnology Flexible Electronics High Performance Computing Internet of Things Large Scale Systems Machine Learning Security & Servers Agenda available online, early bird expires July 31 st! arm.com/summit
56 Thank You! Danke! Merci!!! Gracias! Kiitos! 78 Thanks to Eric Van Hensbergen, Chris Goodyer, Miguel Tairum- Cruz, Alex Rico, Jose Joao, Giacomo Gabrielli, Olly Perks
57 The Arm trademarks featured in this presentation are registered trademarks or trademarks of Arm Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. 79
Arm's role in co-design for the next generation of HPC platforms
Arm's role in co-design for the next generation of HPC platforms Filippo Spiga Software and Large Scale Systems What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization
More informationArm crossplatform. VI-HPS platform October 16, Arm Limited
Arm crossplatform tools VI-HPS platform October 16, 2018 An introduction to Arm Arm is the world's leading semiconductor intellectual property supplier We license to over 350 partners: present in 95% of
More informationArm in High Performance Computing: Fortran on AArch64
Arm in High Performance Computing: Fortran on AArch64 Nathan Sircombe Arm Manchester nathan.sircombe@arm.com 70% of the world s population uses Arm technology 2 Total computing experience Consumer Arm
More informationSoftware Ecosystem for Arm-based HPC
Software Ecosystem for Arm-based HPC CUG 2018 - Stockholm Florent.Lebeau@arm.com Ecosystem for HPC List of components needed: Linux OS availability Compilers Libraries Job schedulers Debuggers Profilers
More informationARM High Performance Computing
ARM High Performance Computing Eric Van Hensbergen Distinguished Engineer, Director HPC Software & Large Scale Systems Research IDC HPC Users Group Meeting Austin, TX September 8, 2016 ARM 2016 An introduction
More informationArm in HPC. Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm Arm Limited
Arm in HPC Toshinori Kujiraoka Sales Manager, APAC HPC Tools Arm 2019 Arm Limited Arm Technology Connects the World Arm in IOT 21 billion chips in the past year Mobile/Embedded/IoT/ Automotive/GPUs/Servers
More informationARM Performance Libraries Current and future interests
ARM Performance Libraries Current and future interests Chris Goodyer Senior Engineering Manager, HPC Software Workshop on Batched, Reproducible, and Reduced Precision BLAS 25 th February 2017 ARM Performance
More informationThe Arm Technology Ecosystem: Current Products and Future Outlook
The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed
More informationBeyond Hardware IP An overview of Arm development solutions
Beyond Hardware IP An overview of Arm development solutions 2018 Arm Limited Arm Technical Symposia 2018 Advanced first design cost (US$ million) IC design complexity and cost aren t slowing down 542.2
More informationProfiling and Debugging OpenCL Applications with ARM Development Tools. October 2014
Profiling and Debugging OpenCL Applications with ARM Development Tools October 2014 1 Agenda 1. Introduction to GPU Compute 2. ARM Development Solutions 3. Mali GPU Architecture 4. Using ARM DS-5 Streamline
More informationPreliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths
Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee and M. Sato RIKEN AICS (Advanced Institute for Computational
More informationInnovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing
Innovative Alternate Architecture for Exascale Computing Surya Hotha Director, Product Marketing Cavium Corporate Overview Enterprise Mobile Infrastructure Data Center and Cloud Service Provider Cloud
More informationThe Mont-Blanc project Updates from the Barcelona Supercomputing Center
montblanc-project.eu @MontBlanc_EU The Mont-Blanc project Updates from the Barcelona Supercomputing Center Filippo Mantovani This project has received funding from the European Union's Horizon 2020 research
More informationToward Building up ARM HPC Ecosystem
Toward Building up ARM HPC Ecosystem Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Sept. 12 th, 2017 0 Outline Fujitsu s Super computer development history and Post-K
More informationA sneak peek into SVE and VLA programming
A sneak peek into SVE and VLA programming Francesco Petrogalli September 2018 Page 1 of 19 Contents Contents 2 Introduction 2 1 SVE: an overview 3 New architectural registers...............................................
More informationARM instruction sets and CPUs for wide-ranging applications
ARM instruction sets and CPUs for wide-ranging applications Chris Turner Director, CPU technology marketing ARM Tech Forum Taipei July 4 th 2017 ARM computing is everywhere #1 shipping GPU in the world
More informationARMv8-A Scalable Vector Extension for Post-K. Copyright 2016 FUJITSU LIMITED
ARMv8-A Scalable Vector Extension for Post-K 0 Post-K Supports New SIMD Extension The SIMD extension is a 512-bit wide implementation of SVE SVE is an HPC-focused SIMD instruction extension in AArch64
More informationWebinar Tips and Tricks for Porting HPC Apps
Webinar Tips and Tricks for Porting HPC Apps 2018 Arm Limited John C. Linford 6 December 2018 Arm is changing the game in HPC Sandia s Astra supercomputer is the first Arm-based
More informationSUSE Linux Entreprise Server for ARM
FUT89013 SUSE Linux Entreprise Server for ARM Trends and Roadmap Jay Kruemcke Product Manager jayk@suse.com @mr_sles ARM Overview ARM is a Reduced Instruction Set (RISC) processor family British company,
More informationPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach
More informationOptimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs
Optimizing Cache Coherent Subsystem Architecture for Heterogeneous Multicore SoCs Niu Feng Technical Specialist, ARM Tech Symposia 2016 Agenda Introduction Challenges: Optimizing cache coherent subsystem
More informationBootstrapping a HPC Ecosystem
Bootstrapping a HPC Ecosystem Eric Van Hensbergen Fellow Senior Director of HPC Software and Large Scale Systems Research Teratech Forum June 19, 2018 Copyright ARM computing is everywhere #1 shipping
More informationEnabling the ARM high performance computing (HPC) software ecosystem
Enabling the ARM high performance computing (HPC) software ecosystem Ashok Bhat Product manager, HPC and Server tools ARM Tech Symposia India December 7th 2016 Are these supercomputers? For example, the
More informationGOING ARM A CODE PERSPECTIVE
GOING ARM A CODE PERSPECTIVE ISC18 Guillaume Colin de Verdière JUNE 2018 GCdV PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France June 2018 A history of disruptions All dates are installation dates of the machines
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationArm Processor Technology Update and Roadmap
Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture
More informationComprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications
Comprehensive Arm Solutions for Innovative Machine Learning (ML) and Computer Vision (CV) Applications Helena Zheng ML Group, Arm Arm Technical Symposia 2017, Taipei Machine Learning is a Subset of Artificial
More informationARM processors driving automotive innovation
ARM processors driving automotive innovation Chris Turner Director of advanced technology marketing, CPU group ARM tech forums, Seoul and Taipei June/July 2016 The ultimate intelligent connected device
More informationToward Building up Arm HPC Ecosystem --Fujitsu s Activities--
Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 28 th, 2018 0 Copyright 2018 FUJITSU LIMITED Outline of
More informationComparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard
Prof Simon McIntosh-Smith Isambard PI University of Bristol / GW4 Alliance Comparative Benchmarking of the First Generation of HPC-Optimised Arm Processors on Isambard Isambard system specification 10,000+
More informationEach Milliwatt Matters
Each Milliwatt Matters Ultra High Efficiency Application Processors Govind Wathan Product Manager, CPG ARM Tech Symposia China 2015 November 2015 Ultra High Efficiency Processors Used in Diverse Markets
More informationModern Processor Architectures. L25: Modern Compiler Design
Modern Processor Architectures L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant minimising the number of instructions
More informationThe Bifrost GPU architecture and the ARM Mali-G71 GPU
The Bifrost GPU architecture and the ARM Mali-G71 GPU Jem Davies ARM Fellow and VP of Technology Hot Chips 28 Aug 2016 Introduction to ARM Soft IP ARM licenses Soft IP cores (amongst other things) to our
More informationAddressing the Increasing Challenges of Debugging on Accelerated HPC Systems. Ed Hinkel Senior Sales Engineer
Addressing the Increasing Challenges of Debugging on Accelerated HPC Systems Ed Hinkel Senior Sales Engineer Agenda Overview - Rogue Wave & TotalView GPU Debugging with TotalView Nvdia CUDA Intel Phi 2
More informationModern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design
Modern Processor Architectures (A compiler writer s perspective) L25: Modern Compiler Design The 1960s - 1970s Instructions took multiple cycles Only one instruction in flight at once Optimisation meant
More informationAMD S X86 OPEN64 COMPILER. Michael Lai AMD
AMD S X86 OPEN64 COMPILER Michael Lai AMD CONTENTS Brief History AMD and Open64 Compiler Overview Major Components of Compiler Important Optimizations Recent Releases Performance Applications and Libraries
More informationAMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING FELLOW 3 OCTOBER 2016
AMD ACCELERATING TECHNOLOGIES FOR EXASCALE COMPUTING BILL.BRANTLEY@AMD.COM, FELLOW 3 OCTOBER 2016 AMD S VISION FOR EXASCALE COMPUTING EMBRACING HETEROGENEITY CHAMPIONING OPEN SOLUTIONS ENABLING LEADERSHIP
More informationEvolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University Scalable Tools Workshop 7 August 2017
Evolving HPCToolkit John Mellor-Crummey Department of Computer Science Rice University http://hpctoolkit.org Scalable Tools Workshop 7 August 2017 HPCToolkit 1 HPCToolkit Workflow source code compile &
More informationThe challenge of SVE in QEMU. Alex Bennée Senior Virtualization Engineer
The challenge of SVE in QEMU Alex Bennée Senior Virtualization Engineer alex.bennee@linaro.org What is ARM s SVE? Why implement SVE in QEMU? How does QEMU s TCG work? Challenges for QEMU s emulation Current
More informationS Comparing OpenACC 2.5 and OpenMP 4.5
April 4-7, 2016 Silicon Valley S6410 - Comparing OpenACC 2.5 and OpenMP 4.5 James Beyer, NVIDIA Jeff Larkin, NVIDIA GTC16 April 7, 2016 History of OpenMP & OpenACC AGENDA Philosophical Differences Technical
More informationTools and Methodology for Ensuring HPC Programs Correctness and Performance. Beau Paisley
Tools and Methodology for Ensuring HPC Programs Correctness and Performance Beau Paisley bpaisley@allinea.com About Allinea Over 15 years of business focused on parallel programming development tools Strong
More informationBringing Intelligence to Enterprise Storage Drives
Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely
More informationTransforming the Data Center with ARM
WHITE PAPER Transforming the Data Center with ARM Maximizing Energy, Scalability, and Performance in the Modern Data Center IT leaders looking to modernize their computer, networking and storage systems
More informationHPC Network Stack on ARM
HPC Network Stack on ARM Pavel Shamis (Pasha) Principal Research Engineer ARM Research ExaComm 2017 06/22/2017 HPC network stack on ARM? 2 Serious ARM HPC deployments starting in 2017 ARM Emerging CPU
More informationBuilding supercomputers from embedded technologies
http://www.montblanc-project.eu Building supercomputers from embedded technologies Alex Ramirez Barcelona Supercomputing Center Technical Coordinator This project and the research leading to these results
More informationOpenMP 4.0: A Significant Paradigm Shift in Parallelism
OpenMP 4.0: A Significant Paradigm Shift in Parallelism Michael Wong OpenMP CEO michaelw@ca.ibm.com http://bit.ly/sc13-eval SC13 OpenMP 4.0 released 2 Agenda The OpenMP ARB History of OpenMP OpenMP 4.0
More informationBifrost - The GPU architecture for next five billion
Bifrost - The GPU architecture for next five billion Hessed Choi Senior FAE / ARM ARM Tech Forum June 28 th, 2016 Vulkan 2 ARM 2016 What is Vulkan? A 3D graphics API for the next twenty years Logical successor
More informationSIMD Exploitation in (JIT) Compilers
SIMD Exploitation in (JIT) Compilers Hiroshi Inoue, IBM Research - Tokyo 1 What s SIMD? Single Instruction Multiple Data Same operations applied for multiple elements in a vector register input 1 A0 input
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationAccelerating Insights In the Technical Computing Transformation
Accelerating Insights In the Technical Computing Transformation Dr. Rajeeb Hazra Vice President, Data Center Group General Manager, Technical Computing Group June 2014 TOP500 Highlights Intel Xeon Phi
More informationPost-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED
Post-K Supercomputer Overview 1 Post-K supercomputer overview Developing Post-K as the successor to the K computer with RIKEN Developing HPC-optimized high performance CPU and system software Selected
More informationIndustrial-Strength High-Performance RISC-V Processors for Energy-Efficient Computing
Industrial-Strength High-Performance RISC-V Processors for Energy-Efficient Computing Dave Ditzel dave@esperanto.ai President and CEO Esperanto Technologies, Inc. 7 th RISC-V Workshop November 28, 2017
More informationThe Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations
The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations Ophir Maor HPC Advisory Council ophir@hpcadvisorycouncil.com The HPC-AI Advisory Council World-wide HPC non-profit
More informationEuropean energy efficient supercomputer project
http://www.montblanc-project.eu European energy efficient supercomputer project Simon McIntosh-Smith University of Bristol (Based on slides from Alex Ramirez, BSC) Disclaimer: Speaking for myself... All
More informationOmpCloud: Bridging the Gap between OpenMP and Cloud Computing
OmpCloud: Bridging the Gap between OpenMP and Cloud Computing Hervé Yviquel, Marcio Pereira and Guido Araújo University of Campinas (UNICAMP), Brazil A bit of background qguido Araujo, PhD Princeton University
More informationApril 2 nd, Bob Burroughs Director, HPC Solution Sales
April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software
More informationPerformance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models. Jason Andrews
Performance Optimization for an ARM Cortex-A53 System Using Software Workloads and Cycle Accurate Models Jason Andrews Agenda System Performance Analysis IP Configuration System Creation Methodology: Create,
More informationBringing Intelligence to Enterprise Storage Drives
Bringing Intelligence to Enterprise Storage Drives Neil Werdmuller Director Storage Solutions Arm Santa Clara, CA 1 Who am I? 28 years experience in embedded Lead the storage solutions team Work closely
More informationDesigning, developing, debugging ARM Cortex-A and Cortex-M heterogeneous multi-processor systems
Designing, developing, debugging ARM and heterogeneous multi-processor systems Kinjal Dave Senior Product Manager, ARM ARM Tech Symposia India December 7 th 2016 Topics Introduction System design Software
More informationWAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Arm Limited
WAVE ONE MAINFRAME WAVE THREE INTERNET WAVE FOUR MOBILE & CLOUD WAVE TWO PERSONAL COMPUTING & SOFTWARE Artificial Intelligence Fifth wave Data-driven computing era IoT Generating data 5G 5G Transporting
More informationUpdate of Post-K Development Yutaka Ishikawa RIKEN AICS
Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing
More informationThe Benefits of GPU Compute on ARM Mali GPUs
The Benefits of GPU Compute on ARM Mali GPUs Tim Hartley 1 SEMICON Europa 2014 ARM Introduction World leading semiconductor IP Founded in 1990 1060 processor licenses sold to more than 350 companies >
More informationDeveloping, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge
Developing, Debugging, and Optimizing GPU Codes for High Performance Computing with Allinea Forge Ryan Hulguin Applications Engineer ryan.hulguin@arm.com Agenda Introduction Overview of Allinea Products
More informationNew Approaches to Connected Device Security
New Approaches to Connected Device Security Erik Jacobson Architecture Marketing Director Arm Arm Techcon 2017 - If you connect it to the Internet, someone will try to hack it. - If what you put on the
More informationSupercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC?
Supercomputing with Commodity CPUs: Are Mobile SoCs Ready for HPC? Nikola Rajovic, Paul M. Carpenter, Isaac Gelado, Nikola Puzovic, Alex Ramirez, Mateo Valero SC 13, November 19 th 2013, Denver, CO, USA
More informationCLICK TO EDIT MASTER TITLE STYLE. Click to edit Master text styles. Second level Third level Fourth level Fifth level
CLICK TO EDIT MASTER TITLE STYLE Second level THE HETEROGENEOUS SYSTEM ARCHITECTURE ITS (NOT) ALL ABOUT THE GPU PAUL BLINZER, FELLOW, HSA SYSTEM SOFTWARE, AMD SYSTEM ARCHITECTURE WORKGROUP CHAIR, HSA FOUNDATION
More informationARM TrustZone for ARMv8-M for software engineers
ARM TrustZone for ARMv8-M for software engineers Ashok Bhat Product Manager, HPC and Server tools ARM Tech Symposia India December 7th 2016 The need for security Communication protection Cryptography,
More informationThe Next Steps in the Evolution of ARM Cortex-M
The Next Steps in the Evolution of ARM Cortex-M Joseph Yiu Senior Embedded Technology Manager CPU Group ARM Tech Symposia China 2015 November 2015 Trust & Device Integrity from Sensor to Server 2 ARM 2015
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationARM mbed: Internet of Possible
ARM mbed: Internet of Possible Bill Woo Director ISG Sales El Tower / 2017 Tech Forum June 28, 2017 Introduction Today enterprises are under pressure to unlock the value in the Internet of Things. Our
More informationThe Changing Face of Edge Compute
The Changing Face of Edge Compute 2018 Arm Limited Alvin Yang Nov 2018 Market trends acceleration of technology deployment 26 years 4 years 100 billion chips shipped 100 billion chips shipped 1 Trillion
More informationSYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS
SYSTEMS ON CHIP (SOC) FOR EMBEDDED APPLICATIONS Embedded System System Set of components needed to perform a function Hardware + software +. Embedded Main function not computing Usually not autonomous
More informationBig.LITTLE Processing with ARM Cortex -A15 & Cortex-A7
Big.LITTLE Processing with ARM Cortex -A15 & Cortex-A7 Improving Energy Efficiency in High-Performance Mobile Platforms Peter Greenhalgh, ARM September 2011 This paper presents the rationale and design
More informationBarcelona Supercomputing Center
www.bsc.es Barcelona Supercomputing Center Centro Nacional de Supercomputación EMIT 2016. Barcelona June 2 nd, 2016 Barcelona Supercomputing Center Centro Nacional de Supercomputación BSC-CNS objectives:
More informationThe Mont-Blanc Project
http://www.montblanc-project.eu The Mont-Blanc Project Daniele Tafani Leibniz Supercomputing Centre 1 Ter@tec Forum 26 th June 2013 This project and the research leading to these results has received funding
More informationTrends in HPC (hardware complexity and software challenges)
Trends in HPC (hardware complexity and software challenges) Mike Giles Oxford e-research Centre Mathematical Institute MIT seminar March 13th, 2013 Mike Giles (Oxford) HPC Trends March 13th, 2013 1 / 18
More informationThe Cray Programming Environment. An Introduction
The Cray Programming Environment An Introduction Vision Cray systems are designed to be High Productivity as well as High Performance Computers The Cray Programming Environment (PE) provides a simple consistent
More informationRISC-V: Enabling a New Era of Open Data-Centric Computing Architectures
Presentation Brief RISC-V: Enabling a New Era of Open Data-Centric Computing Architectures Delivers Independent Resource Scaling, Open Source, and Modular Chip Design for Big Data and Fast Data Environments
More informationARMv8-A Next-Generation Vector Architecture for HPC
ARMv8-A Next-Generation Vector Architecture for HPC Nigel Stephens Lead ISA Architect and ARM Fellow Hot Chips 28, Cupertino August 22, 2016 Expanding ARMv8 vector processing ARMv7 Advanced SIMD (aka ARM
More informationHETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE
HETEROGENEOUS SYSTEM ARCHITECTURE: PLATFORM FOR THE FUTURE Haibo Xie, Ph.D. Chief HSA Evangelist AMD China OUTLINE: The Challenges with Computing Today Introducing Heterogeneous System Architecture (HSA)
More informationProductive Performance on the Cray XK System Using OpenACC Compilers and Tools
Productive Performance on the Cray XK System Using OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 The New Generation of Supercomputers Hybrid
More informationProgramming for the Intel Many Integrated Core Architecture By James Reinders. The Architecture for Discovery. PowerPoint Title
Programming for the Intel Many Integrated Core Architecture By James Reinders The Architecture for Discovery PowerPoint Title Intel Xeon Phi coprocessor 1. Designed for Highly Parallel workloads 2. and
More informationWhat is gem5 and where do I get it?
What is gem5 and where do I get it? Andreas Sandberg & Nikos Nikoleris ARM Research Why gem5? Runs real workloads Runs complex workloads like Android & ChromeOS System-level insights Device interactions
More information2017 Arm Limited. How to design an IoT SoC and get Arm CPU IP for no upfront license fee
2017 Arm Limited How to design an IoT SoC and get Arm CPU IP for no upfront license fee An enhanced Arm DesignStart Building on a strong foundation Successfully used by 1000s of designers, researchers
More informationARMv8-A CPU Architecture Overview
ARMv8-A CPU Architecture Overview Chris Shore Training Manager, ARM ARM Game Developer Day, London 03/12/2015 Chris Shore ARM Training Manager With ARM for 16 years Managing customer training for 15 years
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC and hyperscale customers deploy accelerated
More informationCortex-A75 and Cortex-A55 DynamIQ processors Powering applications from mobile to autonomous driving
Cortex-A75 and Cortex- DynamIQ processors Powering applications from mobile to autonomous driving Lionel Belnet Sr. Product Manager Arm Arm Tech Symposia 2017 Agenda Market growth and trends DynamIQ technology
More informationScientific Programming in C XIV. Parallel programming
Scientific Programming in C XIV. Parallel programming Susi Lehtola 11 December 2012 Introduction The development of microchips will soon reach the fundamental physical limits of operation quantum coherence
More informationTECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING
TECHNICAL OVERVIEW ACCELERATED COMPUTING AND THE DEMOCRATIZATION OF SUPERCOMPUTING Accelerated computing is revolutionizing the economics of the data center. HPC enterprise and hyperscale customers deploy
More informationPortable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.
Portable and Productive Performance with OpenACC Compilers and Tools Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc. 1 Cray: Leadership in Computational Research Earth Sciences
More informationPORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune
PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further
More informationARMv8-A Software Development
ARMv8-A Software Development Course Description ARMv8-A software development is a 4 days ARM official course. The course goes into great depth and provides all necessary know-how to develop software for
More informationIndustry Collaboration and Innovation
Industry Collaboration and Innovation Industry Landscape Key changes occurring in our industry Historical microprocessor technology continues to deliver far less than the historical rate of cost/performance
More informationOpenACC Course. Office Hour #2 Q&A
OpenACC Course Office Hour #2 Q&A Q1: How many threads does each GPU core have? A: GPU cores execute arithmetic instructions. Each core can execute one single precision floating point instruction per cycle
More informationAn Introduction to OpenACC
An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15
More informationFUJITSU HPC and the Development of the Post-K Supercomputer
FUJITSU HPC and the Development of the Post-K Supercomputer Toshiyuki Shimizu Vice President, System Development Division, Next Generation Technical Computing Unit 0 November 16 th, 2016 Post-K is currently
More informationECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart
ECMWF Workshop on High Performance Computing in Meteorology 3 rd November 2010 Dean Stewart Agenda Company Overview Rogue Wave Product Overview IMSL Fortran TotalView Debugger Acumem ThreadSpotter 1 Copyright
More informationExpanding Opportunities in Clamshell Devices. Laurence Bryant VP Strategic Marketing
Expanding Opportunities in Clamshell Devices Laurence Bryant VP Strategic Marketing 1 PC Mobile Ecosystem Scaling The Richness Of Small Screen Experiences The smartphone and tablet ecosystem is shaping
More informationParallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model
Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy
More informationA Fast Instruction Set Simulator for RISC-V
A Fast Instruction Set Simulator for RISC-V Maxim.Maslov@esperantotech.com Vadim.Gimpelson@esperantotech.com Nikita.Voronov@esperantotech.com Dave.Ditzel@esperantotech.com Esperanto Technologies, Inc.
More information