FUJITSU HPC and the Development of the Post-K Supercomputer

Size: px
Start display at page:

Download "FUJITSU HPC and the Development of the Post-K Supercomputer"

Transcription

1 FUJITSU HPC and the Development of the Post-K Supercomputer Toshiyuki Shimizu Vice President, System Development Division, Next Generation Technical Computing Unit 0 November 16 th, 2016 Post-K is currently under development. Information in these slides is subject to change without notice

2 FUJITSU HPC and Post-K Development Introduction of HPC solutions, HPC product portfolio High-end HPC supercomputer development Performance of high-end machines preceding the Post-K Post-K Goals and approaches Post-K hardware Post-K software Performance balance Summary 1

3 FUJITSU HPC Solutions to Satisfy Customer Demands High-end supercomputers, both Fujitsu-developed CPUs and x86 cluster systems Single system image operation w/ FUJITSU system software High performance, high availability, and high reliability K computer Co-developed with RIKEN) Supercomputer PRIMEHPC PRIMEHPC FX10 x86 Cluster High-end Divisional Departmental Work Group PRIMEHPC FX100 CX400/CX600(KNL) BX900/BX400 RX2530/RX Large-Scale SMP System RX900 High scalability with Fujitsu-developed CPU and interconnect PRIMERGY x86 cluster systems support the latest CPUs and accelerators

4 FUJITSU High-end Supercomputer Development FUJITSU App. review PRIMEHPC FX10 1.8x CPU perf. of K Easier installation Japan s National Projects Development HPCI strategic apps program FS projects PRIMEHPC FX100 4x(DP) / 8x(SP) CPU per. of K, Tofu2 High-density pkg & lower energy Technical Computing Suite (TCS) Handles millions of parallel jobs OS: Lower OS jitter w/ FEFS: super scalable file system assistant MPI: Ultra scalable collective communication libraries Operation of K computer Post-K Post-K computer development 3 K computer and PRIMEHPC FX10/FX100 in Operation The CPU and interconnect of FX10/FX100 supercomputers inherit the K computer s architectural concept, featuring state-of-the-art technologies System software TCS supports FUJITSU supercomputers Many applications are currently running on these machines and being developed for science and various industries Post-K Supercomputer RIKEN and FUJITSU are working together to provide a successor to the K computer with application R&D teams using a co-design approach

5 Achievements with the K computer Prestigious Benchmark Awards TOP500: 10.5Pflops, 93% efficiency HPCG: 602Tflops, No. 1 Graph500: 38.6TTEPS 5.3% efficiency No. 1 No. 1 HPC Challenge Class 1: No.1 at all categories (1) Global HPL, (2) Global RandomAccess, (3) EP STREAM, (4) Global FFT Gordon Bell Prize Awards First principles calculation of electronic states of a silico nanowire with 100,000 atoms on the K computer (2011) 4.45 Pflops Astrophysical N-Body Simulation on K Computer The Gravitational Trillion-Body Problem (2012) at SC16 6 years from the shipment Nominated as finalist Simulations of Below-Ground Dynamics of Fungi: Pflops Attained by Automated Generation and Autotuning of Temporal Blocking Codes (2016 finalist) 4

6 Performance of FUJITSU High-end Machines FUJITSU s custom CPUs steadily increase their FP performance Uncompromised data bandwidth for the best use of applications With the FX100, we introduced the SMaC concept, followed by the assistant (AC), and CMG structure FX100 FX10 K computer Available year CY2015 CY2012 CY2010 Double Flops / CPU 1 TF 235 GF 128 GF Single Flops / CPU 2 TF 235 GF 128 GF SIMD width 256 bit 128 bit 128 bit # of CMG (# of s/cmg*) 2 (16+1xAC*) 1 (16) 1 (8) Memory BW 480 GB/s 83.5 GB/s 64 GB/s Byte per flop 0.4 ~ 0.5 *AC (Assistant Core) for OS jitter reduction by processing IO operations and async. MPI handling *CMG (Core Memory Group) is a group of CPU s sharing L2 and memory for efficient BW 5

7 SMaC (Scalable Many Core) Concept & Approach MAC MAC Memory interface Many -oriented, long-lasting architecture Scalable performance improvement by increasing the number of s Increasing the number of s would be safe, even in the post-moore s Law era, by using 3D stacking and alternative, newer technologies Middle-sized, general purpose, out-of-order, superscalar processor Good performance for variety of apps Low power by optimal balance of resources & perf. Assistant OS jitter reduction by processing IO op, async MPI Highly scalable performance by low system noise Core Memory Group (CMG), many building block, ccnuma between CMGs Hierarchal structure for hybrid parallel model Optimized area and performance FX100 CPU implementation Memory interface Tofu2 controller CORE MAC MAC Tofu2 interface CMG L2 cache Assistant Assistant CMG L2 cache (Shared L2 cache & Memories) PCI controller PCI interface 6

8 Post-K Goals and Approaches Post-K Goals High application performance and good power efficiency Keeping application compatibility while advancing from predecessors Good usability and better accessibility for users Our Approaches Developing high performance and scalable, custom CPU s Performance Wider SIMD & high memory BW, mathematical acc. primitives Scalability SMaC (scalable many ), zero OS jitter (assistant ) Power efficiency The best device tech, power control functions, optimal resources Maintaining performance balance and supporting advanced features High memory BW, Tofu interconnect, and RIKEN advanced system software Adopting ARM standard architecture Co-operation with ARM/Linux community and utilization of open source software Getting involved in the ARM HPC ecosystem 7

9 Post-K Powered by FUJITSU-designed CPU Cores & Tofu FUJITSU CPU s support ARM SVE ISA FUJITSU, as a lead partner in ARM SVE development, contributes to specification of ARM SVE (Scalable Vector Extension), for application performance FUJITSU ARM incorporates FUJITSU s proven supercomputer microarchitecture ARM SVE, plus optional functions and Tofu, maintain programing models and performance balance Post-K complies ARM s standard frameworks (SBSA, etc.), for compatibility among platforms SVE incorporated Optional functions Functions for Perf. Post-K FX100 FX10 K computer SIMD 512bit 256bit 128bit 128bit FMA4 Math. acc. prim.* Enhanced Inter- barrier Sector cache Enhanced Prefetch modes Enhanced Interconnect Tofu Enhanced *Mathematical acceleration primitives include trigonometric functions, sine & cosines, and exponential... 8

10 System Software for Post-K Currently in development, based on co-design scheme with application developers, including system hardware FUJITSU Technical Computing Suite / RIKEN Advanced System Software Management Software System management for highly available & power saving operation Job management for higher system utilization & power efficiency Post-K Applications Hierarchical File I/O Software Lustre-based distributed file system FEFS Linux OS / McKernel (Lightweight Kernel) Post-K System Hardware Programming Environment MPI (Open MPI, ) OpenMP, COARRAY, Math Libs. Compilers (C, C++, Fortran) Debugging and tuning tools 9

11 FUJITSU Compiler for Post-K Maximizes the execution performance of HPC applications Covers a wide range of applications, including integer calculations are dominant Targets 512bit-wide vectorization as well as Vector-length-agnostic Fixed-vector-length facilitates optimizations such as constant folding Inherits options/features of K computer, PRIMEHPC FX10 and FX100 Language Standard Support Fully supported : Fortran 2008, C11, C++14, OpenMP 4.5 Partially supported : Fortran 2015, C++1z, OpenMP 5.0 Supports ARM C Language Extensions (ACLE) for SVE ACLE allow programmers to use SVE instructions as C intrinsic functions // C intrinsics in ACLE for SVE svfloat64_t z0 = svld1_f64(p0, &x[i]); svfloat64_t z1 = svld1_f64(p0, &y[i]); svfloat64_t z2 = svadd_f64_x(p0, z0, z1); svst1_f64(p0, &z[i], z2); // SVE assembler ld1d z1.d, p0/z, [x19, x3, lsl #3] ld1d z0.d, p0/z, [x20, x3, lsl #3] fadd z1.d, p0/m, z1.d, z0.d st1d z1.d, p0, [x21, x3, lsl #3] 10

12 Vectorization by FUJITSU Compiler # of executed instruction ratio Dynamic instruction counts of representative loops of NPB 3.3-SER % SIMD 69% SIMD 69% SIMD 72% SIMD 256b SIMD(FX100) 512b SIMD(Post-K) 512b SIMD(Estimated from FX100 result) 0 MG BT SP LU Vectorized loops in TSVC* (Fortran and C) // Sample of vectorized loop by SVE TSVC (total) FX100 Post-K Fortran (135) C (151) // s482 for (int i = 0; i < LEN; i++) { a[i] += b[i] * c[i]; if (c[i] > b[i]) break; } *[Fortran] D. Callahan, J. Dongarra, and D. Levine. Vectorizing compilers: a test suite and results. In Supercomputing '88, pp [C] S. Maleki, Y. Gao, M. J. Garzar n, T. Wong, and D. A. Padua, "An Evaluation of Vectorizing Compilers, PACT2011, pp

13 Discussion on the Perf. Balance for Applications Effectiveness for the meteorology application, IFS*, was evaluated Good performance balance w/ wider SIMD and memory bandwidth from K to FX100 realizes an IFS performance improvement Trying to keep the performance balance throughout the generations toward Post-K will be expected to provide scalable speed-up Speedup of IFS CNT4 (TL159, 96 s) K computer FX100 Flops / CPU 128 Gflops 1 Tflops SIMD width 128 bit 256 bit Memory BW 64 GB/s 480 GB/s Byte per flop 0.4 ~ 0.5 (1) BW limits performance (2) Narrower SIMD limits performance 1.5x by doubling SIMD (3) Insufficient gain by 2x B/F.5x B/F 2x B/F w/ narrow SIMD *The Integrated Forecasting System (IFS) is developed by ECMWF Balanced 2x B/F 12

14 Summary of Post-K Development Developing high performance, scalable, custom CPU s SMaC architecture with an assistant for scalable performance ARM instruction set architecture, SVE, as a standard architecture ARM standard frameworks, SBSA, etc., for compatibility among platforms Keeping performance balanced and advancing preceding machines Higher performance and higher data bandwidth Advanced system software and applications Co-design scheme with application developers FUJITSU optimizing compilers for Post-K Performance balance is a key for application speedup Post-K will meet requirements & be valuable for science and industries 13

15 Post-K: Succeeding the K computer Heritage Prestigious Benchmark Awards TOP500: 10.5Pflops, 93% efficiency HPCG: 602Tflops, No. 1 Graph500: 38.6TTEPS 5.3% efficiency No. 1 No. 1 HPC Challenge Class 1: No.1 at all categories (1) Global HPL, (2) Global RandomAccess, (3) EP STREAM, (4) Global FFT Gordon Bell Prize Awards First principles calculation of electronic states of a silico nanowire with 100,000 atoms on the K computer (2011) 4.45 Pflops Astrophysical N-Body Simulation on K Computer The Gravitational Trillion-Body Problem (2012) at SC16 6 years from the shipment Nominated as finalist Simulations of Below-Ground Dynamics of Fungi: Pflops Attained by Automated Generation and Autotuning of Temporal Blocking Codes (2016 finalist) 14

16

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED Post-K Supercomputer Overview 1 Post-K supercomputer overview Developing Post-K as the successor to the K computer with RIKEN Developing HPC-optimized high performance CPU and system software Selected

More information

Post-K: Building the Arm HPC Ecosystem

Post-K: Building the Arm HPC Ecosystem Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach

More information

Toward Building up ARM HPC Ecosystem

Toward Building up ARM HPC Ecosystem Toward Building up ARM HPC Ecosystem Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Sept. 12 th, 2017 0 Outline Fujitsu s Super computer development history and Post-K

More information

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED Post-K Development and Introducing DLU 0 Fujitsu s HPC Development Timeline K computer The K computer is still competitive in various fields; from advanced research to manufacturing. Deep Learning Unit

More information

ARMv8-A Scalable Vector Extension for Post-K. Copyright 2016 FUJITSU LIMITED

ARMv8-A Scalable Vector Extension for Post-K. Copyright 2016 FUJITSU LIMITED ARMv8-A Scalable Vector Extension for Post-K 0 Post-K Supports New SIMD Extension The SIMD extension is a 512-bit wide implementation of SVE SVE is an HPC-focused SIMD instruction extension in AArch64

More information

Introduction of Fujitsu s next-generation supercomputer

Introduction of Fujitsu s next-generation supercomputer Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of

More information

Findings from real petascale computer systems with meteorological applications

Findings from real petascale computer systems with meteorological applications 15 th ECMWF Workshop Findings from real petascale computer systems with meteorological applications Toshiyuki Shimizu Next Generation Technical Computing Unit FUJITSU LIMITED October 2nd, 2012 Outline

More information

Fujitsu s new supercomputer, delivering the next step in Exascale capability

Fujitsu s new supercomputer, delivering the next step in Exascale capability Fujitsu s new supercomputer, delivering the next step in Exascale capability Toshiyuki Shimizu November 19th, 2014 0 Past, PRIMEHPC FX100, and roadmap for Exascale 2011 2012 2013 2014 2015 2016 2017 2018

More information

Fujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited

Fujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited Fujitsu HPC Roadmap Beyond Petascale Computing Toshiyuki Shimizu Fujitsu Limited Outline Mission and HPC product portfolio K computer*, Fujitsu PRIMEHPC, and the future K computer and PRIMEHPC FX10 Post-FX10,

More information

Fujitsu High Performance CPU for the Post-K Computer

Fujitsu High Performance CPU for the Post-K Computer Fujitsu High Performance CPU for the Post-K Computer August 21 st, 2018 Toshio Yoshida FUJITSU LIMITED 0 Key Message A64FX is the new Fujitsu-designed Arm processor It is used in the post-k computer A64FX

More information

Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu s Approach to Application Centric Petascale Computing Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview

More information

Technical Computing Suite supporting the hybrid system

Technical Computing Suite supporting the hybrid system Technical Computing Suite supporting the hybrid system Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect

More information

Programming for Fujitsu Supercomputers

Programming for Fujitsu Supercomputers Programming for Fujitsu Supercomputers Koh Hotta The Next Generation Technical Computing Fujitsu Limited To Programmers who are busy on their own research, Fujitsu provides environments for Parallel Programming

More information

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED Key Technologies for 100 PFLOPS How to keep the HPC-tree growing Molecular dynamics Computational materials Drug discovery Life-science Quantum chemistry Eigenvalue problem FFT Subatomic particle phys.

More information

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED Advanced Software for the Supercomputer PRIMEHPC FX10 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied

More information

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview

More information

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks

More information

PRIMEHPC FX10: Advanced Software

PRIMEHPC FX10: Advanced Software PRIMEHPC FX10: Advanced Software Koh Hotta Fujitsu Limited System Software supports --- Stable/Robust & Low Overhead Execution of Large Scale Programs Operating System File System Program Development for

More information

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities--

Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 28 th, 2018 0 Copyright 2018 FUJITSU LIMITED Outline of

More information

The STREAM Benchmark. John D. McCalpin, Ph.D. IBM eserver Performance ^ Performance

The STREAM Benchmark. John D. McCalpin, Ph.D. IBM eserver Performance ^ Performance The STREAM Benchmark John D. McCalpin, Ph.D. IBM eserver Performance 2005-01-27 History Scientific computing was largely based on the vector paradigm from the late 1970 s through the 1980 s E.g., the classic

More information

Fujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED

Fujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED Fujitsu Petascale Supercomputer PRIMEHPC FX10 4x2 racks (768 compute nodes) configuration PRIMEHPC FX10 Highlights Scales up to 23.2 PFLOPS Improves Fujitsu s supercomputer technology employed in the FX1

More information

Overview of the Post-K processor

Overview of the Post-K processor 重点課題 9 シンポジウム 2019 年 1 9 Overview of the Post-K processor ポスト京システムの概要と開発進捗状況 Mitsuhisa Sato Team Leader of Architecture Development Team Deputy project leader, FLAGSHIP 2020 project Deputy Director, RIKEN

More information

Composite Metrics for System Throughput in HPC

Composite Metrics for System Throughput in HPC Composite Metrics for System Throughput in HPC John D. McCalpin, Ph.D. IBM Corporation Austin, TX SuperComputing 2003 Phoenix, AZ November 18, 2003 Overview The HPC Challenge Benchmark was announced last

More information

HOKUSAI System. Figure 0-1 System diagram

HOKUSAI System. Figure 0-1 System diagram HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application

More information

Introduction to the K computer

Introduction to the K computer Introduction to the K computer Fumiyoshi Shoji Deputy Director Operations and Computer Technologies Div. Advanced Institute for Computational Science RIKEN Outline ü Overview of the K

More information

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Update of Post-K Development Yutaka Ishikawa RIKEN AICS Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

The way toward peta-flops

The way toward peta-flops The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops

More information

Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths

Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee and M. Sato RIKEN AICS (Advanced Institute for Computational

More information

The Architecture and the Application Performance of the Earth Simulator

The Architecture and the Application Performance of the Earth Simulator The Architecture and the Application Performance of the Earth Simulator Ken ichi Itakura (JAMSTEC) http://www.jamstec.go.jp 15 Dec., 2011 ICTS-TIFR Discussion Meeting-2011 1 Location of Earth Simulator

More information

An Overview of Fujitsu s Lustre Based File System

An Overview of Fujitsu s Lustre Based File System An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu

More information

Fujitsu and the HPC Pyramid

Fujitsu and the HPC Pyramid Fujitsu and the HPC Pyramid Wolfgang Gentzsch Executive HPC Strategist (external) Fujitsu Global HPC Competence Center June 20 th, 2012 1 Copyright 2012 FUJITSU "Fujitsu's objective is to contribute to

More information

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future November 16 th, 2011 Motoi Okuda Technical Computing Solution Unit Fujitsu Limited Agenda Achievements

More information

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid

More information

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620 Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved

More information

ARM High Performance Computing

ARM High Performance Computing ARM High Performance Computing Eric Van Hensbergen Distinguished Engineer, Director HPC Software & Large Scale Systems Research IDC HPC Users Group Meeting Austin, TX September 8, 2016 ARM 2016 An introduction

More information

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1

Introduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1 Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip

More information

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA Kazuhiko Komatsu, S. Momose, Y. Isobe, O. Watanabe, A. Musa, M. Yokokawa, T. Aoyama, M. Sato, H. Kobayashi Tohoku University 14 November,

More information

Fujitsu s Technologies to the K Computer

Fujitsu s Technologies to the K Computer Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The

More information

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages

More information

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1

Sami Saarinen Peter Towers. 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Acknowledgements: Petra Kogel Sami Saarinen Peter Towers 11th ECMWF Workshop on the Use of HPC in Meteorology Slide 1 Motivation Opteron and P690+ clusters MPI communications IFS Forecast Model IFS 4D-Var

More information

White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10

White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10 White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10 Next Generation Technical Computing Unit Fujitsu Limited Contents Overview of the PRIMEHPC FX10 Supercomputer 2 SPARC64 TM IXfx: Fujitsu-Developed

More information

Fujitsu and the HPC Pyramid

Fujitsu and the HPC Pyramid Fujitsu and the HPC Pyramid Wolfgang Gentzsch Executive HPC Strategist (external) Fujitsu Global HPC Competence Center June 20 th, 2012 1 Copyright 2012 FUJITSU "Fujitsu's objective is to contribute to

More information

Compiler Technology That Demonstrates Ability of the K computer

Compiler Technology That Demonstrates Ability of the K computer ompiler echnology hat Demonstrates Ability of the K computer Koutarou aki Manabu Matsuyama Hitoshi Murai Kazuo Minami We developed SAR64 VIIIfx, a new U for constructing a huge computing system on a scale

More information

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence Jens Domke Research Staff at MATSUOKA Laboratory GSIC, Tokyo Institute of Technology, Japan Omni-Path User Group 2017/11/14 Denver,

More information

The Cray Rainier System: Integrated Scalar/Vector Computing

The Cray Rainier System: Integrated Scalar/Vector Computing THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier

More information

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing Accelerating HPC (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing SAAHPC, Knoxville, July 13, 2010 Legal Disclaimer Intel may make changes to specifications and product

More information

Fujitsu's Lustre Contributions - Policy and Roadmap-

Fujitsu's Lustre Contributions - Policy and Roadmap- Lustre Administrators and Developers Workshop 2014 Fujitsu's Lustre Contributions - Policy and Roadmap- Shinji Sumimoto, Kenichiro Sakai Fujitsu Limited, a member of OpenSFS Outline of This Talk Current

More information

The Stampede is Coming: A New Petascale Resource for the Open Science Community

The Stampede is Coming: A New Petascale Resource for the Open Science Community The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation

More information

The Mont-Blanc project Updates from the Barcelona Supercomputing Center

The Mont-Blanc project Updates from the Barcelona Supercomputing Center montblanc-project.eu @MontBlanc_EU The Mont-Blanc project Updates from the Barcelona Supercomputing Center Filippo Mantovani This project has received funding from the European Union's Horizon 2020 research

More information

HPC in the Multicore Era

HPC in the Multicore Era HPC in the Multicore Era -Challenges and opportunities - David Barkai, Ph.D. Intel HPC team High Performance Computing 14th Workshop on the Use of High Performance Computing in Meteorology ECMWF, Shinfield

More information

Fast-multipole algorithms moving to Exascale

Fast-multipole algorithms moving to Exascale Numerical Algorithms for Extreme Computing Architectures Software Institute for Methodologies and Abstractions for Codes SIMAC 3 Fast-multipole algorithms moving to Exascale Lorena A. Barba The George

More information

The Arm Technology Ecosystem: Current Products and Future Outlook

The Arm Technology Ecosystem: Current Products and Future Outlook The Arm Technology Ecosystem: Current Products and Future Outlook Dan Ernst, PhD Advanced Technology Cray, Inc. Why is an Ecosystem Important? An Ecosystem is a collection of common material Developed

More information

Basic Specification of Oakforest-PACS

Basic Specification of Oakforest-PACS Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS

More information

HPCS HPCchallenge Benchmark Suite

HPCS HPCchallenge Benchmark Suite HPCS HPCchallenge Benchmark Suite David Koester, Ph.D. () Jack Dongarra (UTK) Piotr Luszczek () 28 September 2004 Slide-1 Outline Brief DARPA HPCS Overview Architecture/Application Characterization Preliminary

More information

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Fourth Workshop on Accelerator Programming Using Directives (WACCPD), Nov. 13, 2017 Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC Takuma

More information

High Performance Computing with Accelerators

High Performance Computing with Accelerators High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing

More information

Supercomputer SX-9 Development Concept

Supercomputer SX-9 Development Concept SX-9 - the seventh generation in the series since the announcement of SX-1/2 in 1983 - is NEC s latest supercomputer that not only extends past SX-series achievements in large-scale shared memory, memory

More information

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity Fujitsu High Performance Computing Ecosystem Human Centric Innovation in Action Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Innovation Flexibility Simplicity INTERNAL USE ONLY 0 Copyright

More information

Brand-New Vector Supercomputer

Brand-New Vector Supercomputer Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

Current and Future Challenges of the Tofu Interconnect for Emerging Applications

Current and Future Challenges of the Tofu Interconnect for Emerging Applications Current and Future Challenges of the Tofu Interconnect for Emerging Applications Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 22, 2017, ExaComm 2017 Workshop

More information

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.

CSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI. CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance

More information

Overview of Tianhe-2

Overview of Tianhe-2 Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn

More information

The Art of Parallel Processing

The Art of Parallel Processing The Art of Parallel Processing Ahmad Siavashi April 2017 The Software Crisis As long as there were no machines, programming was no problem at all; when we had a few weak computers, programming became a

More information

Arm's role in co-design for the next generation of HPC platforms

Arm's role in co-design for the next generation of HPC platforms Arm's role in co-design for the next generation of HPC platforms Filippo Spiga Software and Large Scale Systems What it is Co-design? Abstract: Preparations for Exascale computing have led to the realization

More information

Architecture, Programming and Performance of MIC Phi Coprocessor

Architecture, Programming and Performance of MIC Phi Coprocessor Architecture, Programming and Performance of MIC Phi Coprocessor JanuszKowalik, Piotr Arłukowicz Professor (ret), The Boeing Company, Washington, USA Assistant professor, Faculty of Mathematics, Physics

More information

Introduction of Oakforest-PACS

Introduction of Oakforest-PACS Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS

More information

Getting the best performance from massively parallel computer

Getting the best performance from massively parallel computer Getting the best performance from massively parallel computer June 6 th, 2013 Takashi Aoki Next Generation Technical Computing Unit Fujitsu Limited Agenda Second generation petascale supercomputer PRIMEHPC

More information

What can/should we measure with benchmarks?

What can/should we measure with benchmarks? What can/should we measure with benchmarks? Jun Makino Department of Planetology, Kobe University FS2020 Project, RIKEN-CCS SC18 BoF 107 Pros and Cons of HPCx benchmarks Nov 13 Overview Last 40 years of

More information

Experiences of the Development of the Supercomputers

Experiences of the Development of the Supercomputers Experiences of the Development of the Supercomputers - Earth Simulator and K Computer YOKOKAWA, Mitsuo Kobe University/RIKEN AICS Application Oriented Systems Developed in Japan No.1 systems in TOP500

More information

The Earth Simulator Current Status

The Earth Simulator Current Status The Earth Simulator Current Status SC13. 2013 Ken ichi Itakura (Earth Simulator Center, JAMSTEC) http://www.jamstec.go.jp 2013 SC13 NEC BOOTH PRESENTATION 1 JAMSTEC Organization Japan Agency for Marine-Earth

More information

What does Heterogeneity bring?

What does Heterogeneity bring? What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or

More information

Intel High-Performance Computing. Technologies for Engineering

Intel High-Performance Computing. Technologies for Engineering 6. LS-DYNA Anwenderforum, Frankenthal 2007 Keynote-Vorträge II Intel High-Performance Computing Technologies for Engineering H. Cornelius Intel GmbH A - II - 29 Keynote-Vorträge II 6. LS-DYNA Anwenderforum,

More information

VOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017

VOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017 VOLTA: PROGRAMMABILITY AND PERFORMANCE Jack Choquette NVIDIA Hot Chips 2017 1 TESLA V100 21B transistors 815 mm 2 80 SM 5120 CUDA Cores 640 Tensor Cores 16 GB HBM2 900 GB/s HBM2 300 GB/s NVLink *full GV100

More information

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing. Chen Zheng ICT,CAS

BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing. Chen Zheng ICT,CAS BOPS, Not FLOPS! A New Metric, Measuring Tool, and Roofline Performance Model For Datacenter Computing Chen Zheng ICT,CAS Data Center Computing (DC ) HPC only takes 20% market share Big Data, AI, Internet

More information

April 2 nd, Bob Burroughs Director, HPC Solution Sales

April 2 nd, Bob Burroughs Director, HPC Solution Sales April 2 nd, 2019 Bob Burroughs Director, HPC Solution Sales Today - Introducing 2 nd Generation Intel Xeon Scalable Processors how Intel Speeds HPC performance Work Time System Peak Efficiency Software

More information

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo Overview of Supercomputer Systems Supercomputing Division Information Technology Center The University of Tokyo Supercomputers at ITC, U. of Tokyo Oakleaf-fx (Fujitsu PRIMEHPC FX10) Total Peak performance

More information

designing a GPU Computing Solution

designing a GPU Computing Solution designing a GPU Computing Solution Patrick Van Reeth EMEA HPC Competency Center - GPU Computing Solutions Saturday, May the 29th, 2010 1 2010 Hewlett-Packard Development Company, L.P. The information contained

More information

High Performance Computing: Architecture, Applications, and SE Issues. Peter Strazdins

High Performance Computing: Architecture, Applications, and SE Issues. Peter Strazdins High Performance Computing: Architecture, Applications, and SE Issues Peter Strazdins Department of Computer Science, Australian National University e-mail: peter@cs.anu.edu.au May 17, 2004 COMP1800 Seminar2-1

More information

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing

Innovative Alternate Architecture for Exascale Computing. Surya Hotha Director, Product Marketing Innovative Alternate Architecture for Exascale Computing Surya Hotha Director, Product Marketing Cavium Corporate Overview Enterprise Mobile Infrastructure Data Center and Cloud Service Provider Cloud

More information

Computing architectures Part 2 TMA4280 Introduction to Supercomputing

Computing architectures Part 2 TMA4280 Introduction to Supercomputing Computing architectures Part 2 TMA4280 Introduction to Supercomputing NTNU, IMF January 16. 2017 1 Supercomputing What is the motivation for Supercomputing? Solve complex problems fast and accurately:

More information

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility

More information

High Performance Computing with Fujitsu

High Performance Computing with Fujitsu High Performance Computing with Fujitsu Ivo Doležel 0 2017 FUJITSU FUJITSU Software HPC Cluster Suite A complete HPC software stack solution HPC cluster general characteristics HPC clusters consist primarily

More information

SSD Based First Layer File System for the Next Generation Super-computer

SSD Based First Layer File System for the Next Generation Super-computer SSD Based First Layer File System for the Next Generation Super-omputer Shinji Sumimoto, Ph.D. Next Generation Tehnial Computing Unit FUJITSU LIMITED Sept. 24 th, 2018 0 Outline of This Talk A64FX: High

More information

High Performance Computing in C and C++

High Performance Computing in C and C++ High Performance Computing in C and C++ Rita Borgo Computer Science Department, Swansea University WELCOME BACK Course Administration Contact Details Dr. Rita Borgo Home page: http://cs.swan.ac.uk/~csrb/

More information

Course web site: teaching/courses/car. Piazza discussion forum:

Course web site:   teaching/courses/car. Piazza discussion forum: Announcements Course web site: http://www.inf.ed.ac.uk/ teaching/courses/car Lecture slides Tutorial problems Courseworks Piazza discussion forum: http://piazza.com/ed.ac.uk/spring2018/car Tutorials start

More information

Top500 Supercomputer list

Top500 Supercomputer list Top500 Supercomputer list Tends to represent parallel computers, so distributed systems such as SETI@Home are neglected. Does not consider storage or I/O issues Both custom designed machines and commodity

More information

Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi

Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Experiences in Optimizations of Preconditioned Iterative Solvers for FEM/FVM Applications & Matrix Assembly of FEM using Intel Xeon Phi Kengo Nakajima Supercomputing Research Division Information Technology

More information

Performance Evaluation with the HPCC Benchmarks as a Guide on the Way to Peta Scale Systems

Performance Evaluation with the HPCC Benchmarks as a Guide on the Way to Peta Scale Systems Performance Evaluation with the HPCC Benchmarks as a Guide on the Way to Peta Scale Systems Rolf Rabenseifner, Michael M. Resch, Sunil Tiyyagura, Panagiotis Adamidis rabenseifner@hlrs.de resch@hlrs.de

More information

An Introduction to OpenACC

An Introduction to OpenACC An Introduction to OpenACC Alistair Hart Cray Exascale Research Initiative Europe 3 Timetable Day 1: Wednesday 29th August 2012 13:00 Welcome and overview 13:15 Session 1: An Introduction to OpenACC 13:15

More information

Thinking Outside of the Tera-Scale Box. Piotr Luszczek

Thinking Outside of the Tera-Scale Box. Piotr Luszczek Thinking Outside of the Tera-Scale Box Piotr Luszczek Brief History of Tera-flop: 1997 1997 ASCI Red Brief History of Tera-flop: 2007 Intel Polaris 2007 1997 ASCI Red Brief History of Tera-flop: GPGPU

More information

A Case for High Performance Computing with Virtual Machines

A Case for High Performance Computing with Virtual Machines A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation

More information

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes

Introduction: Modern computer architecture. The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Introduction: Modern computer architecture The stored program computer and its inherent bottlenecks Multi- and manycore chips and nodes Motivation: Multi-Cores where and why Introduction: Moore s law Intel

More information

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model

Parallel Programming Principle and Practice. Lecture 9 Introduction to GPGPUs and CUDA Programming Model Parallel Programming Principle and Practice Lecture 9 Introduction to GPGPUs and CUDA Programming Model Outline Introduction to GPGPUs and Cuda Programming Model The Cuda Thread Hierarchy / Memory Hierarchy

More information

IBM Blue Gene/Q solution

IBM Blue Gene/Q solution IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems

More information

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency The next generation supercomputer and NWP system of JMA Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency Contents JMA supercomputer systems Current system (Mar

More information

CSC 447: Parallel Programming for Multi- Core and Cluster Systems

CSC 447: Parallel Programming for Multi- Core and Cluster Systems CSC 447: Parallel Programming for Multi- Core and Cluster Systems Why Parallel Computing? Haidar M. Harmanani Spring 2017 Definitions What is parallel? Webster: An arrangement or state that permits several

More information