Fujitsu s Technologies to the K Computer

Similar documents
Fujitsu s Approach to Application Centric Petascale Computing

Fujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future

PRIMEHPC FX10: Advanced Software

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED

Fujitsu s new supercomputer, delivering the next step in Exascale capability

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Japan HPC Programs - The Japanese national project of the K computer -

Introduction to the K computer

The way toward peta-flops

Introduction of Fujitsu s next-generation supercomputer

Experiences of the Development of the Supercomputers

Introduction of Oakforest-PACS

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED

Post-K: Building the Arm HPC Ecosystem

Technical Computing Suite supporting the hybrid system

Challenges in Developing Highly Reliable HPC systems

Fujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED

Findings from real petascale computer systems with meteorological applications

Brand-New Vector Supercomputer

Programming for Fujitsu Supercomputers

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

Mathematical computations with GPUs

The Architecture and the Application Performance of the Earth Simulator

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

Pedraforca: a First ARM + GPU Cluster for HPC

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Green Supercomputing

Tianhe-2, the world s fastest supercomputer. Shaohua Wu Senior HPC application development engineer

Overview of Tianhe-2

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Cray XC Scalability and the Aries Network Tony Ford

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

FUJITSU HPC and the Development of the Post-K Supercomputer

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Exascale: challenges and opportunities in a power constrained world

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED

Himeno Performance Benchmark and Profiling. December 2010

Parallel and Distributed Systems. Hardware Trends. Why Parallel or Distributed Computing? What is a parallel computer?

Mapping MPI+X Applications to Multi-GPU Architectures

CUDA. Matthew Joyner, Jeremy Williams

Basic Specification of Oakforest-PACS

High Performance Computing. What is it used for and why?

How то Use HPC Resources Efficiently by a Message Oriented Framework.

Fujitsu High Performance CPU for the Post-K Computer

Fujitsu and the HPC Pyramid

Trends in HPC (hardware complexity and software challenges)

Oak Ridge National Laboratory Computing and Computational Sciences

Jülich Supercomputing Centre

Performance Tools for Technical Computing

The Red Storm System: Architecture, System Update and Performance Analysis

Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration

Scheduling Strategies for HPC as a Service (HPCaaS) for Bio-Science Applications

HOKUSAI System. Figure 0-1 System diagram

The Cray Rainier System: Integrated Scalar/Vector Computing

Lattice QCD on Graphics Processing Units?

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Practical Scientific Computing

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

How to perform HPL on CPU&GPU clusters. Dr.sc. Draško Tomić

HPC Architectures. Types of resource currently in use

Fujitsu's Lustre Contributions - Policy and Roadmap-

ECE 574 Cluster Computing Lecture 2

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

Fabio AFFINITO.

Compute Node Linux: Overview, Progress to Date & Roadmap

Early Experiences Writing Performance Portable OpenMP 4 Codes

Identifying Working Data Set of Particular Loop Iterations for Dynamic Performance Tuning

Medical practice: diagnostics, treatment and surgery in supercomputing centers

Fujitsu and the HPC Pyramid

An Overview of Fujitsu s Lustre Based File System

Getting the best performance from massively parallel computer

ECMWF Workshop on High Performance Computing in Meteorology. 3 rd November Dean Stewart

The IBM Blue Gene/Q: Application performance, scalability and optimisation

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

ICON for HD(CP) 2. High Definition Clouds and Precipitation for Advancing Climate Prediction

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

HPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

NIA CFD Futures Conference Hampton, VA; August 2012

An Introduction to OpenACC

T2K & HA-PACS Projects Supercomputers at CCS

Tightly Coupled Accelerators Architecture

CC-IN2P3: A High Performance Data Center for Research

CS2214 COMPUTER ARCHITECTURE & ORGANIZATION SPRING Top 10 Supercomputers in the World as of November 2013*

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

Intel High-Performance Computing. Technologies for Engineering

Compiler Technology That Demonstrates Ability of the K computer

Lecture 9: MIMD Architectures

Co-existence: Can Big Data and Big Computation Co-exist on the Same Systems?

Transcription:

Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd.

Agenda The Next generation supercomputer project of Japan The K computer Design concept of the K computer Our technologies applied to the K computer Preliminary performance figures of the K computer Toward post 10PFlops era and Exa-scale computing Conclusion 1

History of the K computer project Project officially started mid of 2006 System installation started in Oct. 2010 Partial system started test-operation in April 2011 Full system installation and adjustment will be completed by middle of 2012 Official operation will start by the end of 2012 Application software projects are also running concurrently 2006 2007 2008 2009 2010 2011 2012 Conceptual design Detailed design Prototype, evaluation Production, installation, and adjustment Tuning Next-Generation Integrated Nano-science Simulation Next-Generation Integrated Simulation of Living Matter 2

Pre History of the K computer project Primary R&D project started in 2005 National grid project started in 2003 High-end Computing WG initiated the feasibility study for future high end computing environment from application point of view in 2001 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 WG for High-end Computing National Grid Project NAREGI Primary R&D projects for Next Generation Supercomputer Conceptual design Detailed design Production, Prototype, installation, Tuning evaluation and adjustment Next-Generation Integrated Nano-science Simulation Next-Generation Integrated Simulation of Living Matter 3

Target Applications of the K computer 4 Courtesy of RIKEN

Design target of the K computer Toward wider coverage of applications and higher performance on those applications High Performance 10PFlops at LINPACK High productivity Easy to extract high performance from the highly paralleled programs without inordinate burden to programmers Sophisticated language and programmer support environment High operability Low power consumption High reliability and easy to operate Ensuring target date : mid. of 2012 5

Fujitsu s technologies applied to the K computer SPARC64 TM VIIIfx Processor HPC-ACE (SPARC V9 Architecture Enhancement for HPC) :128GFlpos SIMD Register enhancements Software controllable cache Hardware barrier between core Main frame CPU level of high reliable design Low power consumption : ~58W New Interconnect,Tofu 6-dimensional Mesh/Torus topology High speed, highly scalable, high operability and high availability interconnect for over 100,000 nodes system Functional interconnect Single CPU per node configuration High memory bandwidth and simple memory hierarchy CPU/ICC direct water cooling High reliability, low power consumption and compact packaging 6 LINPACK 10 PFlops Over 1PB mem. 800 racks 80,000 CPUs 640,000 cores

Fujitsu s technologies applied to the K computer (cont.) Software environment Applications HPC Portal / System Management Portal System Operations Management System configuration management System monitoring System installation& operation Job operations Management Job manager Job scheduler Resource management High Performance File System Lustre based distributed file system High scalability IO bandwidth guarantee High reliability and availability Compiler (Fortran, C, C++) Hybrid parallel programming Sector cache support SIMD/register file extensions MPI/Math. Libraries Tuned for hardware Support tools Profiler & tuning tools Interactive debugger Linux based OS enhanced for K computer The K computer 7

Performance Efficiency The K computer s Performance Productivity LINPACK performance and its efficiency (RMax : LINPACK Performance / RPeak : Peak Performance) GSIC (Japan) NSCS (China) Jaguar (US) Opteron 横軸 top500 ( 左の方が一位 ) v.s. 効率横軸を性能にする案もあるか? NSCT (China) Performance (R max PFlops) June 2011 K computer (subset) 68,544 CPUs, 548,352 cores 8.162PFlops, 93.0% SPARC64 TM VIIIfx Other Fujitsu System 9

The K computer s Performance (cont.) Greenness LINPACK performance and its power consumption Power Efficiency (RMax MFlops/W) IBM BlueGene/Q prototype (US) PowerBQC Nagasaki Univ. (Japan) CINECA/SCS (Italy) GSIC (Japan) K computer (subset) 825 MFlops/W SPARC64 TM VIIIfx NSCT (China) June 2011 FZJ (German) XCell Performance (R max PFlops) 10

The K computer s Performance (cont.) LINPACK performance and its computing time June 2011 Computing Time (Hours) JAXA (Japan) FX1 SPARC64 TM VII Jaguar (US) Opteron K computer (subset) 28 hr. SPARC64 TM VIIIfx NSCT (China) Performance (R max PFlops) 11

Performance Efficiency The K computer s Performance (cont.) Productivity LINPACK performance efficiency and power consumption (RMax : LINPACK Performance / RPeak : Peak Performance) Power data not registered FZJ(German),etc. XCell NSCT(China) K computer (subset) SPARC64 TM VIIIfx CINECA/SCS(Italy) GSIC(Japan) IBM BlueGene/Q prototype (US) PowerBQC Nagasaki Univ. (Japan) June 2011 Power Efficiency (R Max MFlops/W) 12 Greenness

Scalability The K computer s Performance (cont.) Example of the fundamental BMT performance on 1.05PFlops system* High efficient threading between cores and functional interconnect Hybrid execution** with Integrated MPI support Hybrid execution** w/o Integrated MPI support Flat MPI execution with Integrated MPI support No. of Cores Scalability of the HIMENO-BMT*** (XL size, 1,024 x 512 x 512) * : 65,536 cores, 8,192 CPUs, ** : 8 thread /node + MPI ** : HIMENO-BMT, Benchmark program which measures the speed of major loops to solve Poisson's equation solution using Jacobi iteration method. In this measurement, Grid-size XL was used. 13

Applications running on the K Computer Several real applications are now running on the K computer which is in test operation phase First priority applications has been optimized, tested and evaluated Program Discipline Outline Scheme NICAM Seism3D FrontFlow/ Blue PHASE Earth science Earth science Engineering Material science Nonhydrostatic ICosahedral Atmospheric Model (NICAM) for Global-Cloud Resolving Simulations Simulation of Seismic-Wave Propagation and Strong Ground Motions Unsteady Flow Analysis based on Large Eddy Simulation (LES) First-Principles Simulation within the Plane-Wave Pseudo potential formalism FDM (atmosphere) FDM (wave) FEM (fluid) DFT (plane wave) RSDFT Material science Ab-initio Calculation in Real Space The realspace DFT LatticeQCD Physics Study of elementary particle and nuclear physics based on Lattice QCD simulation QCD Others More than 20 applications are optimizing and testing on the K computer 14

For post 10PFlops era and Exa-scale computing Many core architecture CNT technologies Graphene technologies Under 20nm semiconductor tech. 3D stacked memory DNA computer CPU integrated interconnect I/F Quantum computer Optical computer On board optical link FPGA, Reconfigurable LSI Tight collaboration, co-work and concurrent development with target applications Expansion and brush up of current technologies for practical 10PFlops class computing Technologies Jump for practical 100PFlops class computing Japanese HPC community s big question after March 11 th How Exa-scale computing contribute to the society? Which applications need Exa-scale computing power? Accelerator technologies 15

Conclusion The K computer targeted practical PFlops class computing. Fujitsu s several leading-edge technologies applied to the K computer and achieved excellent performance, productivity and operability How to utilize this huge computer power for bringing safe, reliable and sustainable society in reality is the Fujitsu s next and true challenge This is a milestone to reach real Exa-scale computing Fujitsu will continue our effort toward real Exa-scale computing 16