Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages and Tools (WPSE2010),February 18, 2010

Six goals of the Japan's Third Science and Technology Basic Plan in FY2006 FY2010 Goal One Discovery & Creation of Knowledge toward the Future Goal Two Breakthroughs in Advanced Science and Technology Goal Three Sustainable Development - Consistent with Economy and Environment - Milky Way formation process Planet formation process Nuclear reactor analysis Rocket engine design An influence prediction of El Nino phenomenon Aurora outbreak process Development and Application of Next-Generation Supercomputer Nano technology Car development Biomolecular MD Tsunami damage prediction Clouds analysis Multi-level unified simulation Goal Four Innovator Japan - Strength in Economy & Industry - Goal Five Good Health over Lifetime Goal Five Safe and Secure Nation 2010/02/18 WPSE2010 1

Key Technologies for National Importance X-Ray Free Electron Laser (XFEL) Next-Generation Supercomputer Ocean and Earth Exploratin System Space Transport System Fast Breeder Reactor & Fuel Recycle Technology 2010/02/18 WPSE2010 2

Outline of the Next-Generation Supercomputer project Objectives are to develop the world's most advanced and high-performance supercomputer, to develop and deploy its usage technologies including application software as one of Japan's Key Technologies of National Importance. Period of the project: FY2006-FY2012 RIKEN (The Institute of Physical and Chemical Research ) plays the central role of the project in developing the supercomputer under the law on sharing large scale experimental facilities which are unique in Japan, or so-called common-facilities law. 2010/02/18 WPSE2010 3

Goals of the project Development and installation of the most advanced high performance supercomputer system with LINPACK performance of 10 petaflops. Development and deployment of application software, which should be made to attain the system maximum capability, in various science and engineering fields. Establishment of an Advanced Institute for Computational Science (tentative) as one of the Center of Excellences around supercomputing facilities. 2010/02/18 WPSE2010 4

Major applications on Next-Generation Supercomputer Targeted as Grand Challenges 2010/02/18 WPSE2010 5

System Configuration 2010/02/18 WPSE2010 6

Current System Configuration - Scalar processors based system 2010/02/18 WPSE2010 7

Compute Nodes Compute nodes (CPUs): > 80,000 Number of cores: > 640,000 Peak performance: > 10PFLOPS Memory: > 1PB (16GB/node) Logical 3-dimensional torus network Peak bandwidth: 5GB/s x 2 for each direction of logical 3-dimensional torus network bi-section bandwidth: > 30TB/s SPARC64 TM VIIIfx Compute node Courtesy of FUJITSU Ltd. 2010/02/18 WPSE2010 8

CPU Features (Fujitsu SPARC64 TM VIIIfx) 8 cores 2 SIMD operation circuit 2 Multiply & add floating-point operations (SP or DP) are executed in one SIMD instruction 256 FP registers (double precision) Shared 5MB L2 Cache (10way) Hardware barrier Prefetch instruction Software controllable cache - Sectored cache Performance 16GFLOPS/core, 128GFLOPS/CPU Reference: SPARC64 TM VIIIfx Extensions http://img.jp.fujitsu.com/downloads/jp/jhpc/sparc64viiifx-extensions.pdf 45nm CMOS process, 2GHz 22.7mm x 22.6mm 760 M transisters 58W(at 30 by water cooling) 2010/02/18 WPSE2010 9

System Environments OS: Linux based OS on compute nodes Two-level large-scale distributed file system with local file system and global file system Users permanent file on the global file system Staging functions Files on the global file system which are used in a job are staged into the local file system. Data generated during a job execution are moved back to the global file system after the job finished. File sharing functions by using NSF-like functions Users can access the file on the global file directly from frontend servers. 2010/02/18 WPSE2010 10

Job Environments Batch job-oriented system Interactive environments are available in debugging 2010/02/18 WPSE2010 11

Flow of batch job processing 2010/02/18 WPSE2010 12

Programming languages and compilers Fortran 2003,XPFortran,C,and C++ including several extensions to GNU C and C++ Optimized compilers to obtain full capabilities of SIMD, 256 FP registers, programmable L1 and L2 caches, and so on. Two types of a quadruple precision floating point data format are supported. IEEE754R data format Original double-double data format by using two double precision floating point data in arithmetic libraries 2010/02/18 WPSE2010 13

Libraries and PDE tools MPI library based on the MPI-2.1 specification Low-latency and high-throughput High performance functions of Bcast, Allgather, Alltoall, and Allreduce by the hardware support of the multi-dimensional mesh/torus interconnect network. Numerical libraries BLAS, LAPACK, FFTW, Fujitsu scientific numerical library SSL II, and so on will be provided. Tools for a program development environment Debugger with DWARF2 debugging data format Performance analyzer of job profiles and MPI functions behavior 2010/02/18 WPSE2010 14

Programming models A hybrid programming model with multi-threads and MPI is strongly recommended. A flat programming model by MPI only is also supported. Optimizations and SIMD operation generations by compilers in a core multi-threads by OpenMP directives and/or automatic parallelization by compiler on a CPU Parallelization by MPI libraries or programmming with a high-level Fortran language XPFortran among CPUs Memory Memory Memory Memory 2010/02/18 WPSE2010 15

Schedule of the project We are here. 2010/02/18 WPSE2010 16

Photo of proto-type system Prototype system has been built. Several system boards are compiled and set into a cabinets. System Board システムボード LSI for interconnect Courtesy of FUJITSU Ltd. 2010/02/18 WPSE2010 17

Facilities for the System 2010/02/18 WPSE2010 18

Location of the supercomputer site, Kobe-City Kobe Tokyo 450km (280miles) west from Tokyo 2010/02/18 WPSE2010 19

Artists image of a building 2010/02/18 WPSE2010 20

Sannomiya Port Island South Sta. Next-Generation Supercomputer Site Photo: 2008/10/9 2010/02/18 WPSE2010 21

Research Building Computer Building Cooling Facility Power Facility 2010/02/18 WPSE2010 22

Photo of the site (under construction) November, 2008 March, 2009 July, 2009 September, 2009 Photos from South-Side 2010/02/18 WPSE2010 23

Photo of buildings (as of January, 2010) Research Building Computer building Chillers Substation Supply 2010/02/18 WPSE2010 24

Inside the buildings Computer room (3F) Making a double floor Chillers Solar panels on the top Research building 2010/02/18 WPSE2010 25

Summary System design was completed and a prototype system was built and is being evaluated. The buildings for the system will be completed in the end of May, 2010. 2010/02/18 WPSE2010 26

Thank you for your attention! 2010/02/18 WPSE2010 27