Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd.
Agenda The Next generation supercomputer project of Japan The K computer Design concept of the K computer Our technologies applied to the K computer Preliminary performance figures of the K computer Toward post 10PFlops era and Exa-scale computing Conclusion 1
History of the K computer project Project officially started mid of 2006 System installation started in Oct. 2010 Partial system started test-operation in April 2011 Full system installation and adjustment will be completed by middle of 2012 Official operation will start by the end of 2012 Application software projects are also running concurrently 2006 2007 2008 2009 2010 2011 2012 Conceptual design Detailed design Prototype, evaluation Production, installation, and adjustment Tuning Next-Generation Integrated Nano-science Simulation Next-Generation Integrated Simulation of Living Matter 2
Pre History of the K computer project Primary R&D project started in 2005 National grid project started in 2003 High-end Computing WG initiated the feasibility study for future high end computing environment from application point of view in 2001 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 WG for High-end Computing National Grid Project NAREGI Primary R&D projects for Next Generation Supercomputer Conceptual design Detailed design Production, Prototype, installation, Tuning evaluation and adjustment Next-Generation Integrated Nano-science Simulation Next-Generation Integrated Simulation of Living Matter 3
Target Applications of the K computer 4 Courtesy of RIKEN
Design target of the K computer Toward wider coverage of applications and higher performance on those applications High Performance 10PFlops at LINPACK High productivity Easy to extract high performance from the highly paralleled programs without inordinate burden to programmers Sophisticated language and programmer support environment High operability Low power consumption High reliability and easy to operate Ensuring target date : mid. of 2012 5
Fujitsu s technologies applied to the K computer SPARC64 TM VIIIfx Processor HPC-ACE (SPARC V9 Architecture Enhancement for HPC) :128GFlpos SIMD Register enhancements Software controllable cache Hardware barrier between core Main frame CPU level of high reliable design Low power consumption : ~58W New Interconnect,Tofu 6-dimensional Mesh/Torus topology High speed, highly scalable, high operability and high availability interconnect for over 100,000 nodes system Functional interconnect Single CPU per node configuration High memory bandwidth and simple memory hierarchy CPU/ICC direct water cooling High reliability, low power consumption and compact packaging 6 LINPACK 10 PFlops Over 1PB mem. 800 racks 80,000 CPUs 640,000 cores
Fujitsu s technologies applied to the K computer (cont.) Software environment Applications HPC Portal / System Management Portal System Operations Management System configuration management System monitoring System installation& operation Job operations Management Job manager Job scheduler Resource management High Performance File System Lustre based distributed file system High scalability IO bandwidth guarantee High reliability and availability Compiler (Fortran, C, C++) Hybrid parallel programming Sector cache support SIMD/register file extensions MPI/Math. Libraries Tuned for hardware Support tools Profiler & tuning tools Interactive debugger Linux based OS enhanced for K computer The K computer 7
Performance Efficiency The K computer s Performance Productivity LINPACK performance and its efficiency (RMax : LINPACK Performance / RPeak : Peak Performance) GSIC (Japan) NSCS (China) Jaguar (US) Opteron 横軸 top500 ( 左の方が一位 ) v.s. 効率横軸を性能にする案もあるか? NSCT (China) Performance (R max PFlops) June 2011 K computer (subset) 68,544 CPUs, 548,352 cores 8.162PFlops, 93.0% SPARC64 TM VIIIfx Other Fujitsu System 9
The K computer s Performance (cont.) Greenness LINPACK performance and its power consumption Power Efficiency (RMax MFlops/W) IBM BlueGene/Q prototype (US) PowerBQC Nagasaki Univ. (Japan) CINECA/SCS (Italy) GSIC (Japan) K computer (subset) 825 MFlops/W SPARC64 TM VIIIfx NSCT (China) June 2011 FZJ (German) XCell Performance (R max PFlops) 10
The K computer s Performance (cont.) LINPACK performance and its computing time June 2011 Computing Time (Hours) JAXA (Japan) FX1 SPARC64 TM VII Jaguar (US) Opteron K computer (subset) 28 hr. SPARC64 TM VIIIfx NSCT (China) Performance (R max PFlops) 11
Performance Efficiency The K computer s Performance (cont.) Productivity LINPACK performance efficiency and power consumption (RMax : LINPACK Performance / RPeak : Peak Performance) Power data not registered FZJ(German),etc. XCell NSCT(China) K computer (subset) SPARC64 TM VIIIfx CINECA/SCS(Italy) GSIC(Japan) IBM BlueGene/Q prototype (US) PowerBQC Nagasaki Univ. (Japan) June 2011 Power Efficiency (R Max MFlops/W) 12 Greenness
Scalability The K computer s Performance (cont.) Example of the fundamental BMT performance on 1.05PFlops system* High efficient threading between cores and functional interconnect Hybrid execution** with Integrated MPI support Hybrid execution** w/o Integrated MPI support Flat MPI execution with Integrated MPI support No. of Cores Scalability of the HIMENO-BMT*** (XL size, 1,024 x 512 x 512) * : 65,536 cores, 8,192 CPUs, ** : 8 thread /node + MPI ** : HIMENO-BMT, Benchmark program which measures the speed of major loops to solve Poisson's equation solution using Jacobi iteration method. In this measurement, Grid-size XL was used. 13
Applications running on the K Computer Several real applications are now running on the K computer which is in test operation phase First priority applications has been optimized, tested and evaluated Program Discipline Outline Scheme NICAM Seism3D FrontFlow/ Blue PHASE Earth science Earth science Engineering Material science Nonhydrostatic ICosahedral Atmospheric Model (NICAM) for Global-Cloud Resolving Simulations Simulation of Seismic-Wave Propagation and Strong Ground Motions Unsteady Flow Analysis based on Large Eddy Simulation (LES) First-Principles Simulation within the Plane-Wave Pseudo potential formalism FDM (atmosphere) FDM (wave) FEM (fluid) DFT (plane wave) RSDFT Material science Ab-initio Calculation in Real Space The realspace DFT LatticeQCD Physics Study of elementary particle and nuclear physics based on Lattice QCD simulation QCD Others More than 20 applications are optimizing and testing on the K computer 14
For post 10PFlops era and Exa-scale computing Many core architecture CNT technologies Graphene technologies Under 20nm semiconductor tech. 3D stacked memory DNA computer CPU integrated interconnect I/F Quantum computer Optical computer On board optical link FPGA, Reconfigurable LSI Tight collaboration, co-work and concurrent development with target applications Expansion and brush up of current technologies for practical 10PFlops class computing Technologies Jump for practical 100PFlops class computing Japanese HPC community s big question after March 11 th How Exa-scale computing contribute to the society? Which applications need Exa-scale computing power? Accelerator technologies 15
Conclusion The K computer targeted practical PFlops class computing. Fujitsu s several leading-edge technologies applied to the K computer and achieved excellent performance, productivity and operability How to utilize this huge computer power for bringing safe, reliable and sustainable society in reality is the Fujitsu s next and true challenge This is a milestone to reach real Exa-scale computing Fujitsu will continue our effort toward real Exa-scale computing 16