Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

Similar documents
Fujitsu s Approach to Application Centric Petascale Computing

The way toward peta-flops

Japan HPC Programs - The Japanese national project of the K computer -

Experiences of the Development of the Supercomputers

Fujitsu s Technologies to the K Computer

Introduction to the K computer

Introduction of Fujitsu s next-generation supercomputer

White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10

Challenges in Developing Highly Reliable HPC systems

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

HOKUSAI System. Figure 0-1 System diagram

PRIMEHPC FX10: Advanced Software

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED

Programming for Fujitsu Supercomputers

The Earth Simulator System

Technical Computing Suite supporting the hybrid system

Fujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited

Post-Petascale Computing. Mitsuhisa Sato

Findings from real petascale computer systems with meteorological applications

An Overview of Fujitsu s Lustre Based File System

Basic Specification of Oakforest-PACS

Trends in HPC (hardware complexity and software challenges)

Key Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED

Fujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

Post-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED

Getting the best performance from massively parallel computer

Fujitsu s new supercomputer, delivering the next step in Exascale capability

Post-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED

White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

The Architecture and the Application Performance of the Earth Simulator

Overview of Supercomputer Systems. Supercomputing Division Information Technology Center The University of Tokyo

The Red Storm System: Architecture, System Update and Performance Analysis

Outline. Execution Environments for Parallel Applications. Supercomputers. Supercomputers

The next generation supercomputer. Masami NARITA, Keiichi KATAYAMA Numerical Prediction Division, Japan Meteorological Agency

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Practical Scientific Computing

Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future

CUDA. Matthew Joyner, Jeremy Williams

User Training Cray XC40 IITM, Pune

Compiler Technology That Demonstrates Ability of the K computer

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

Introduction of Oakforest-PACS

Post-K: Building the Arm HPC Ecosystem

Brand-New Vector Supercomputer

Composite Metrics for System Throughput in HPC

Cray RS Programming Environment

Designing a Cluster for a Small Research Group

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Intel Performance Libraries

EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA

Transport Simulations beyond Petascale. Jing Fu (ANL)

Mathematical computations with GPUs

Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.

Update of Post-K Development Yutaka Ishikawa RIKEN AICS

Cluster Network Products

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Carlo Cavazzoni, HPC department, CINECA

Customer Success Story Los Alamos National Laboratory

MVAPICH User s Group Meeting August 20, 2015

An Empirical Study of Computation-Intensive Loops for Identifying and Classifying Loop Kernels

Introduction to Xeon Phi. Bill Barth January 11, 2013

Users and utilization of CERIT-SC infrastructure

The Future of SPARC64 TM

The Earth Simulator Current Status

Practical Scientific Computing

The Cray Rainier System: Integrated Scalar/Vector Computing

Resources Current and Future Systems. Timothy H. Kaiser, Ph.D.

Pedraforca: a First ARM + GPU Cluster for HPC

High Performance Computing. What is it used for and why?

Introduction to Cluster Computing

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

Barcelona Supercomputing Center

An evaluation of the Performance and Scalability of a Yellowstone Test-System in 5 Benchmarks

FUJITSU HPC and the Development of the Post-K Supercomputer

Hybrid Architectures Why Should I Bother?

Piz Daint: Application driven co-design of a supercomputer based on Cray s adaptive system design

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Blue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft

Portable and Productive Performance with OpenACC Compilers and Tools. Luiz DeRose Sr. Principal Engineer Programming Environments Director Cray Inc.

SAS Enterprise Miner Performance on IBM System p 570. Jan, Hsian-Fen Tsao Brian Porter Harry Seifert. IBM Corporation

Efficiency and Programmability: Enablers for ExaScale. Bill Dally Chief Scientist and SVP, Research NVIDIA Professor (Research), EE&CS, Stanford

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Overview. CS 472 Concurrent & Parallel Programming University of Evansville

The Future of GPU Computing

Fujitsu High Performance CPU for the Post-K Computer

Preparing GPU-Accelerated Applications for the Summit Supercomputer

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

BlueGene/L (No. 4 in the Latest Top500 List)

High Performance Computing. What is it used for and why?

Just on time to face new challenges with NEC super-computer at Meteo-France

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

MAHA. - Supercomputing System for Bioinformatics

Parallel Programming on Ranger and Stampede

Aim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group

Transcription:

Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages and Tools (WPSE2010),February 18, 2010

Six goals of the Japan's Third Science and Technology Basic Plan in FY2006 FY2010 Goal One Discovery & Creation of Knowledge toward the Future Goal Two Breakthroughs in Advanced Science and Technology Goal Three Sustainable Development - Consistent with Economy and Environment - Milky Way formation process Planet formation process Nuclear reactor analysis Rocket engine design An influence prediction of El Nino phenomenon Aurora outbreak process Development and Application of Next-Generation Supercomputer Nano technology Car development Biomolecular MD Tsunami damage prediction Clouds analysis Multi-level unified simulation Goal Four Innovator Japan - Strength in Economy & Industry - Goal Five Good Health over Lifetime Goal Five Safe and Secure Nation 2010/02/18 WPSE2010 1

Key Technologies for National Importance X-Ray Free Electron Laser (XFEL) Next-Generation Supercomputer Ocean and Earth Exploratin System Space Transport System Fast Breeder Reactor & Fuel Recycle Technology 2010/02/18 WPSE2010 2

Outline of the Next-Generation Supercomputer project Objectives are to develop the world's most advanced and high-performance supercomputer, to develop and deploy its usage technologies including application software as one of Japan's Key Technologies of National Importance. Period of the project: FY2006-FY2012 RIKEN (The Institute of Physical and Chemical Research ) plays the central role of the project in developing the supercomputer under the law on sharing large scale experimental facilities which are unique in Japan, or so-called common-facilities law. 2010/02/18 WPSE2010 3

Goals of the project Development and installation of the most advanced high performance supercomputer system with LINPACK performance of 10 petaflops. Development and deployment of application software, which should be made to attain the system maximum capability, in various science and engineering fields. Establishment of an Advanced Institute for Computational Science (tentative) as one of the Center of Excellences around supercomputing facilities. 2010/02/18 WPSE2010 4

Major applications on Next-Generation Supercomputer Targeted as Grand Challenges 2010/02/18 WPSE2010 5

System Configuration 2010/02/18 WPSE2010 6

Current System Configuration - Scalar processors based system 2010/02/18 WPSE2010 7

Compute Nodes Compute nodes (CPUs): > 80,000 Number of cores: > 640,000 Peak performance: > 10PFLOPS Memory: > 1PB (16GB/node) Logical 3-dimensional torus network Peak bandwidth: 5GB/s x 2 for each direction of logical 3-dimensional torus network bi-section bandwidth: > 30TB/s SPARC64 TM VIIIfx Compute node Courtesy of FUJITSU Ltd. 2010/02/18 WPSE2010 8

CPU Features (Fujitsu SPARC64 TM VIIIfx) 8 cores 2 SIMD operation circuit 2 Multiply & add floating-point operations (SP or DP) are executed in one SIMD instruction 256 FP registers (double precision) Shared 5MB L2 Cache (10way) Hardware barrier Prefetch instruction Software controllable cache - Sectored cache Performance 16GFLOPS/core, 128GFLOPS/CPU Reference: SPARC64 TM VIIIfx Extensions http://img.jp.fujitsu.com/downloads/jp/jhpc/sparc64viiifx-extensions.pdf 45nm CMOS process, 2GHz 22.7mm x 22.6mm 760 M transisters 58W(at 30 by water cooling) 2010/02/18 WPSE2010 9

System Environments OS: Linux based OS on compute nodes Two-level large-scale distributed file system with local file system and global file system Users permanent file on the global file system Staging functions Files on the global file system which are used in a job are staged into the local file system. Data generated during a job execution are moved back to the global file system after the job finished. File sharing functions by using NSF-like functions Users can access the file on the global file directly from frontend servers. 2010/02/18 WPSE2010 10

Job Environments Batch job-oriented system Interactive environments are available in debugging 2010/02/18 WPSE2010 11

Flow of batch job processing 2010/02/18 WPSE2010 12

Programming languages and compilers Fortran 2003,XPFortran,C,and C++ including several extensions to GNU C and C++ Optimized compilers to obtain full capabilities of SIMD, 256 FP registers, programmable L1 and L2 caches, and so on. Two types of a quadruple precision floating point data format are supported. IEEE754R data format Original double-double data format by using two double precision floating point data in arithmetic libraries 2010/02/18 WPSE2010 13

Libraries and PDE tools MPI library based on the MPI-2.1 specification Low-latency and high-throughput High performance functions of Bcast, Allgather, Alltoall, and Allreduce by the hardware support of the multi-dimensional mesh/torus interconnect network. Numerical libraries BLAS, LAPACK, FFTW, Fujitsu scientific numerical library SSL II, and so on will be provided. Tools for a program development environment Debugger with DWARF2 debugging data format Performance analyzer of job profiles and MPI functions behavior 2010/02/18 WPSE2010 14

Programming models A hybrid programming model with multi-threads and MPI is strongly recommended. A flat programming model by MPI only is also supported. Optimizations and SIMD operation generations by compilers in a core multi-threads by OpenMP directives and/or automatic parallelization by compiler on a CPU Parallelization by MPI libraries or programmming with a high-level Fortran language XPFortran among CPUs Memory Memory Memory Memory 2010/02/18 WPSE2010 15

Schedule of the project We are here. 2010/02/18 WPSE2010 16

Photo of proto-type system Prototype system has been built. Several system boards are compiled and set into a cabinets. System Board システムボード LSI for interconnect Courtesy of FUJITSU Ltd. 2010/02/18 WPSE2010 17

Facilities for the System 2010/02/18 WPSE2010 18

Location of the supercomputer site, Kobe-City Kobe Tokyo 450km (280miles) west from Tokyo 2010/02/18 WPSE2010 19

Artists image of a building 2010/02/18 WPSE2010 20

Sannomiya Port Island South Sta. Next-Generation Supercomputer Site Photo: 2008/10/9 2010/02/18 WPSE2010 21

Research Building Computer Building Cooling Facility Power Facility 2010/02/18 WPSE2010 22

Photo of the site (under construction) November, 2008 March, 2009 July, 2009 September, 2009 Photos from South-Side 2010/02/18 WPSE2010 23

Photo of buildings (as of January, 2010) Research Building Computer building Chillers Substation Supply 2010/02/18 WPSE2010 24

Inside the buildings Computer room (3F) Making a double floor Chillers Solar panels on the top Research building 2010/02/18 WPSE2010 25

Summary System design was completed and a prototype system was built and is being evaluated. The buildings for the system will be completed in the end of May, 2010. 2010/02/18 WPSE2010 26

Thank you for your attention! 2010/02/18 WPSE2010 27