HPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017

Similar documents
CHAO YANG. Early Experience on Optimizations of Application Codes on the Sunway TaihuLight Supercomputer

INSPUR and HPC Innovation

Sunway TaihuLight: The system and applications. Zhao Liu Director of Application Support Department National Supercomputing Center in Wuxi

Supercomputer Networking via IPv6. Xing Li, Congxiao Bao, Wusheng Zhang

Implicit Low-Order Unstructured Finite-Element Multiple Simulation Enhanced by Dense Computation using OpenACC

INSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM

Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration

The Architecture and the Application Performance of the Earth Simulator

System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited

Overview of Tianhe-2

Optimizing Convolutional Neural Networks on Sunway TaihuLight Supercomputer

Green Supercomputing

Towards Exascale Computing with the Atmospheric Model NUMA

HPC with GPU and its applications from Inspur. Haibo Xie, Ph.D

Presentations: Jack Dongarra, University of Tennessee & ORNL. The HPL Benchmark: Past, Present & Future. Mike Heroux, Sandia National Laboratories

Chapter 5 Supercomputers

HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA

The Earth Simulator Current Status

Architectures for Scalable Media Object Search

HPC Algorithms and Applications

Interconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017

The State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology

Titan - Early Experience with the Titan System at Oak Ridge National Laboratory

swsptrsv: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architecture Xinliang Wang, Weifeng Liu, Wei Xue, Li Wu

Introduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620

Practical Scientific Computing

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

GPU Developments in Atmospheric Sciences

CRAY XK6 REDEFINING SUPERCOMPUTING. - Sanjana Rakhecha - Nishad Nerurkar

Experiences of the Development of the Supercomputers

CafeGPI. Single-Sided Communication for Scalable Deep Learning

Scaling Deep Learning. Bryan

CUDA. Matthew Joyner, Jeremy Williams

ACCELERATED COMPUTING: THE PATH FORWARD. Jen-Hsun Huang, Co-Founder and CEO, NVIDIA SC15 Nov. 16, 2015

High Performance Computing

Mathematical computations with GPUs

Introduction to High-Performance Computing

Report on the Sunway TaihuLight System. Jack Dongarra. University of Tennessee. Oak Ridge National Laboratory

Building NVLink for Developers

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

NVIDIA GPU TECHNOLOGY UPDATE

Accelerated Earthquake Simulations

INSPUR HPC Development

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Fujitsu s Technologies to the K Computer

Timothy Lanfear, NVIDIA HPC

High Performance Linear Algebra

NVIDIA Update and Directions on GPU Acceleration for Earth System Models

GPU Consideration for Next Generation Weather (and Climate) Simulations

Tianhe-2, the world s fastest supercomputer. Shaohua Wu Senior HPC application development engineer

Testbed Overview in China Xiaohong Huang Beijing University of Posts and Telecommunications

Performance Evaluation of a Vector Supercomputer SX-Aurora TSUBASA

Alexander Heinecke (Intel), Josh Tobin (UCSD), Alexander Breuer (UCSD), Charles Yount (Intel), Yifeng Cui (UCSD) Parallel Computing Lab Intel Labs

SKA. data processing, storage, distribution. Jean-Marc Denis Head of Strategy, BigData & HPC. Oct. 16th, 2017 Paris. Coyright Atos 2017

The Future of High- Performance Computing

Parallel Computing & Accelerators. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability

Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA

Building the Most Efficient Machine Learning System

Deep Learning: Transforming Engineering and Science The MathWorks, Inc.

Fast Hardware For AI

TESLA P100 PERFORMANCE GUIDE. Deep Learning and HPC Applications

AI Application and Development in ehealth Field. MIN Dong

PART I - Fundamentals of Parallel Computing

The Future of High Performance Interconnects

Realtime Photometry System for AST3 (AST3_RPS)

LCOC Lustre Cache on Client

The Exascale Era Has Arrived

China s HPC development: a brief review and perspectives

A Simulation of Global Atmosphere Model NICAM on TSUBAME 2.5 Using OpenACC

Bandwidth-Centric Deep Learning Processing through Software-Hardware Co-Design

The knight makes his play for the crown Phi & Omni-Path Glenn Rosenberg Computer Insights UK 2016

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

OPTIMIZED GPU KERNELS FOR DEEP LEARNING. Amir Khosrowshahi

AI for HPC and HPC for AI Workflows: The Differences, Gaps and Opportunities with Data Management

TOWARDS ACCELERATED DEEP LEARNING IN HPC AND HYPERSCALE ARCHITECTURES Environnement logiciel pour l apprentissage profond dans un contexte HPC

Brand-New Vector Supercomputer

Building the Most Efficient Machine Learning System

Object recognition and computer vision using MATLAB and NVIDIA Deep Learning SDK

TESLA P100 PERFORMANCE GUIDE. HPC and Deep Learning Applications

Mapping MPI+X Applications to Multi-GPU Architectures

Godson Processor and its Application in High Performance Computers

Zhang HPC Application R&D Manager,Inspur

DEEP LEARNING WITH GPUS Maxim Milakov, Senior HPC DevTech Engineer, NVIDIA

Deep learning in MATLAB From Concept to CUDA Code

High Performance Computing. Leopold Grinberg T. J. Watson IBM Research Center, USA

Practical Scientific Computing

The Effect of In-Network Computing-Capable Interconnects on the Scalability of CAE Simulations

Memory Footprint of Locality Information On Many-Core Platforms Brice Goglin Inria Bordeaux Sud-Ouest France 2018/05/25

Current Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN

It s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist

Exploiting the OpenPOWER Platform for Big Data Analytics and Cognitive. Rajesh Bordawekar and Ruchir Puri IBM T. J. Watson Research Center

Cloud Computing with FPGA-based NVMe SSDs

DISP: Optimizations Towards Scalable MPI Startup

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning

High-Performance Broadcast for Streaming and Deep Learning

Matrix-free multi-gpu Implementation of Elliptic Solvers for strongly anisotropic PDEs

HPC projects. Grischa Bolls

Implementing Deep Learning for Video Analytics on Tegra X1.

Analysis of Performance Gap Between OpenACC and the Native Approach on P100 GPU and SW26010: A Case Study with GTC-P

Transcription:

HPC and Big Data: Updates about China Haohuan FU August 29 th, 2017 1

Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 2

MOST HPC Projects 2016 n Exa-scale pilot system (three parallel systems) q Sunway successor deep learning benchmark q Tianhe successor q Sugon n Physics model and numerical methods for exa-scale systems n Programming framework n HPC environment service and supporting system n HPC numerical simulator q aircraft simulator q earth simulator n Other typical HPC applications q complex EM environment simulation q ocean and wave modeling q mechanical engineering 310 Million, 19 Projects q material science 3

MOST Cloud Computing and Big Data Projects 2016 n Software defined cloud computing: basic theory and methods n Big data storage technology and platform n Dataflow oriented big data analytic system n Network operating system for cloud computing n Scientific big data management system n Big data management system for advanced manufacturing n Big data based intelligent software development method and environment n Big data knowledge engineering: basic theory and applications n Human-computer interaction n VR, AR 389 Million, 15 Projects 4

Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 5

BigData Science Research Center NSFC-GuangDong Joint Project based on Tianhe-2 q 2016-2020, 300Million, steering by NSCC-GZ Big data projects focus on Smart City q q q q q Transportation 智能交通 Medical & Health 智慧医疗与健康 Disaster prevention 智慧防灾 Finance 智慧金融 Education 智慧教育 q Social Management 智慧管理 Convergence of talent and technology resources, jointly solve key problems of big data science

BigData Science Research Center Bigdata + HPC Pearl River Delta National Bigdata Integration Test Zone

Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 8

2016 Highlights Over 60 large-scale applications from over 100 research institutes covering 19 application domains, 6 fullscale applications, 18 half-scale, 22 million-core-scale, 3 Gordon Bell Finals, and 1 Gordon Bell Prize. SC 2016 Gordon Bell Prize ISC 2016 World Internet Conference International Workshop on HPC Architecture, Software, and Application at an Extreme Scale ygw@tsinghua.edu.cn www.nsccwx.cn/wxcyw

Sunway TaihuLight: Overview System TaihuLight Tianhe-2 Titan Sequoia Cori Peak Performance (PFlops) 125.4 54.9 27.1 20.1 27.9 Total Memory (TB) 1310 1024 710 1572 879 Linpack Performance (PFlops) 93.0(74%) 33.9(62%) 17.6(65%) 17.2(85.3) 14.0(50%) Performance/Power (Mflops/W) 6051.3 1901.5 2142.8 2176.6 3266.8 GTEPS 23755.7 2061.48 ### 23751 ### HPCG (Pflops) 0.3712 0.5801 0.3223 0.3304 0.3554 Rank of Top500 1 2 3 4 5 Rank of Green500 4 135 100 90 26 Rank of Graph500 2 8 ### 3 ### Rank of HPCG 4 2 7 6 5 10

Other Resources n 1 Pflops Commercial System q 980 compute nodes:two-way 12-core E5-2680 V3, 2.5GHz, 128GB q 32 fat nodes:8-way 16-core, 2.2GHz, 1TB q 64 GPU nodes: two-way 8-core, 2.4GHz, 128GB, Tesla K40, 1TB SATA +1.6TB SSD n 100 Gb connection to CERNET 11

Atmospheric Modeling Institute of Software, CAS Tsinghua University Beijing Normal University 10-million-core scalable full implicit solver for non-hydrostatic atmospheric dynamics support up to 500m resolution sustained performance of 7.95 Pflops 2016 Gordon Bell Prize winner

Wave Model First Institute of Oceanography (FIO) Tsinghua University MASNUM (Key laboratory of MArine Science and NUmerical Modeling) wave model sustained performance 45.43 Pflops global 1km wave simulation 2016 Gordon Bell Prize finalist

Phase Field Simulation Computer Network Information Center, CAS Large Scale Phase Field Simulation for Coarsening Dynamics Based on Cahn-Hilliard Equation with Degenerated Mobility over 50 Pflops sustained performance highly scalable, large time-step integrating algorithm

The CESM Project on Sunway TaihuLight CAM5.0 POP2.0 CPL7 CICE4.0 CLM4.0 CESM1.2.0 Tsinghua + BNU 30+ Professors and Students Four component models, millions lines of code Large-scale run on Sunway TaihuLight 24,000 MPI processes Over one million cores 10-20x speedup for kernels 2-3x speedup for the entire model Refactoring and Optimizing the Community Atmosphere Model (CAM) on the Sunway TaihuLight Supercomputer, in Proceedings of SC 2016. 15

Library for Deep Learning (swdnn) n swdnn: Provide interface for optimized basic operators q Fully-connected layer (BLAS); Pooling layer q Activation function; Batch Normalization q *Convolutional Layer(90% time for CNN) Related Works on other architectures Work Platform Method cudnn(2014) GPU GEMM fbtfft(2014) GPU FFT Andrew Lavin (2015) GPU Winograd Chen Zhang (2015) FPGA Direct Conv swdnn SW26010 Blocking GEMM 16

Library for Deep Learning (swdnn) n Performance q Convolutional performance above 1.6 Tflops with double-precision q Speedup ranging from 1.91x to 9.75x compared with cudnnv5.1. 17

Framework for Deep Learning (under development) n Distributed framework q Customized from Caffe with less dependencies q Two-level Parameter Server Based-on MPI Global-Server Sever-Cache Sever-Cache Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker Worker 18

swdnn Supported Project: Sunway-Lingo collaborated with Prof. Zhiqing Liu, BUPT n Original go board to be processed n Converted to a 48-channel image fed to deep CNN with essential go features such as liberties n Order of probabilities of plausible moves as outputted by policy network 19

Long Term Plan n Traditional HPC Applications q weather / climate service q seismic data processing service q CFD simulation framework for Advanced Manufacturing n Deep Learning Related Applications q the swdnn framework q collaborating with face++ for face recognition applications q collaborating with Sogou for voice recognition and translation q customized DNN Sunway chip? n Big Data Center q National Health and Medical Big Data Center at Nanjing 20