China s HPC development: a brief review and perspectives
|
|
- George Fox
- 6 years ago
- Views:
Transcription
1 China s HPC development: a brief review and perspectives Depei Qian Beihang University/Sun Yat-sen University International Symposium on Impact of extreme scale computing Tokyo, Japan Nov. 2, 2017
2 Outline A Brief review The New HPC key project in China Issues in exascale system development
3 A Brief review
4 Three 863 key projects on HPC :High Performance Computer and Core Software Research on resource sharing and collaborative work Grid-enabled applications in multiple areas TFlops computers and China National Grid (CNGrid) testbed :High Productivity Computer and Grid Service Environment High productivity Application performance Efficiency in program development Portability of programs Robust of the system Emphasizing service features of the HPC environment Developing peta-scale computers :High Productivity Computer and Application Service Environment Developing 100PF computers Developing large scale HPC applications Upgrading of CNGird
5 High performance Computers 2013: Tianhe-2 CPU+MIC Heterogeneous accelerated architecture 54.9 PF peak, 33.9 PF Linpack, No. 1 in Top500 for 6 times from 2013 to 2015 Installed at the National Supercomputing Center in Guangzhou Will be upgraded to 100PF this year 2016: Sunway TaihuLight Implemented with home-grown Shenwei many-core processors, 10 million cores in total 125 PF peak, 93 PF Linpack, No. 1 in Top500 in June and Nov. of 2016 Installed at the National Supercomputing Center in Wuxi Tianhe-2 Sunway Bluelight
6 Tianhe-2 upgrade Items Tianhe-2 Tianhe-2A Nodes & Performance nodes with Intel CPU + KNC 54.9Pflops nodes with Intel CPU + Matrix Pflops Interconnection 10Gbps, 1.57us 14Gbps, 1us Memory 1.4PB 3.4PB Storage 12.4PB, 512GB/s 19PB, 1TB/s Energy Efficiency 17.8MW, 1.9Gflops/W About 18MW, >5Gflops/W Heterogeneous software MPSS for Intel KNC OpenMP/OpenCL for Matrix
7 Matrix-2000 accelerator Chip specification 4 super-nodes (SN) 8 clusters per SN 4 cores per cluster Core On chip interconnection Self-defined 256-bit vector ISA 16 DP flops/cycle per core Peak performance: Tflops@1.2GHz SN0 C C C C C C C C Cluster 0 Cluster 1 C C C C C C C C Cluster 2 Cluster 3 C C C C C C C C Cluster 4 Cluster 5 C C C C C C C C Cluster 6 Cluster 7 SN1 C C C C C C C C Cluster 0 Cluster 1 C C C C C C C C Cluster 2 Cluster 3 C C C C C C C C Cluster 4 Cluster 5 C C C C C C C C Cluster 6 Cluster 7 SN2 C C C C C C C C Cluster 0 Cluster 1 C C C C C C C C Cluster 2 Cluster 3 C C C C C C C C Cluster 4 Cluster 5 C C C C C C C C Cluster 6 Cluster 7 SN3 C C C C C C C C Cluster 0 Cluster 1 C C C C C C C C Cluster 2 Cluster 3 C C C C C C C C Cluster 4 Cluster 5 C C C C C C C C Cluster 6 Cluster 7 PCIE DDR4 DDR4 DDR4 DDR4 4 SNs x 8 clusters x 4cores x 16 flops x 1.2 GHz = Tflops Peak power dissipation: ~240w Interface 8 DDR channels X16 PCIE 3.0 EP Port 7
8 Compute Nodes Heterogeneous Compute Nodes Intel Xeon CPU x2 Matrix-2000 x2 Com m. P o rt 16X PCIE N IC G b LA N GE Memory:192GB Interconnection:14G proprietary network Peak performance: 5.34Tflops DDR4 DDR4 MT-2000 MT X PCIE 16X PCIE CPU CPU Q PI DM I IP M B PCH CPLD
9 HPC environment 2016 China National Grid, composed of 17 national supercomputing centers and HPC centers, world leading class computing resources
10 HPC applications 2016 HPC applications in many domains 10-million core parallelism reached, Gordon Bell Prize in 2016 Developed a number application software, adopted by production systems aircraft design high speed train design oil & gas exploration new drug discovery ensemble weather forecasting bio-information car development design optimization of large fluid machinery electromagnetic computation
11 Problems identified Lack of the long-term national program for high performance computing Weak in kernel HPC technologies processor/accelerator novel devices (new memory, storage, and network) large scale parallel algorithms and programs implementation Application software is the bottleneck applications rely on imported commercial software expensive small scale parallelism restricted by export regulation Shortage in cross-disciplinary talents No enough talents with both domain and IT knowledge Lack of multi-disciplinary collaboration
12 The new key HPC project in China
13 Reform of research system in China The national research and development system is undergoing a reform 100+ different national R&D programs/initiatives are merged into 5 tracks of national programs Basic research program (NSFC) Mega-science and technology programs Key R&D program (former 863, 973, enabling programs) Enterprise innovation program Facility/talent program
14 A New key project on HPC High performance computing has been identified as a priority subject under the key R&D program (track 3) Strategic studies and planning have been conducted since 2013 A proposal on HPC in the 13 th five-year plan was submitted in early 2015 The key R&D project was approved in Oct by a multi-government agency committee led by the MOST
15 Motivations The key value of exascale computers identified Addressing the grand challenge problems Energy shortage, pollution, climate change Enabling industry transformation supporting development of important products high speed train, commercial aircraft, automobile promoting economy transformation For social development and people s benefit new drug discovery, precision medicine, digital media Enabling scientific discovery high energy physics, computational chemistry, new material, astrophysics Promote computer industry by technology transfer Developing HPC systems by self-controllable technologies a lesson learnt from the recent embargo regulation
16 Major tasks Exa-scale computer development R&D on novel architectures and key technologies of the exa-scale computer Developing the exa-scale computer based on home-grown processors Technology transfer to promote development of high-end servers HPC applications development Basic research on exa-scale modeling methods and parallel algorithms Developing high performance application software Establishing the HPC application eco-system HPC environment development Developing software and platform for national HPC environment Upgrading of the national HPC environment CNGrid Developing service systems on the national HPC environment Each task will cover basic research, key technology development, and application demonstration
17 Basic research Task 1: Exa-scale computer development Novel high performance interconnect Theoretical work on the novel interconnect based on the enabling technologies of 3D chips, silicon photonics and on-chip networks Programming & execution models for exa-scale systems new programming models for heterogeneous systems Improving programming efficiency
18 Task 1: Exa-scale computer development Key technology prototype systems for verifying the exa-scale system technologies 3 typical applications to verify the design exa-scale computer technologies architecture optimized for multi-objectives high efficient computing node high performance processor/accelerator design exa-scale system software scalable interconnect parallel I/O exa-scale infrastructure energy efficiency exa-scale system reliability
19 Task 1: Exa-scale computer development Exa-scale computer development exaflops in peak Linpack efficiency >60% 10PB memory EB storage 30GF/w energy efficiency interconnect >500Gbps large scale system management and resource scheduling easy-to-use parallel programming environment system monitoring and fault tolerance support large scale applications
20 Task 2: HPC application development Basic research computable modeling and computational methods for exa-scale systems scalable highly efficient parallel algorithms and parallel libraries for exa-scale systems Key technology programming framework for exa-scale software development
21 Task 2: HPC application development Application software Numerical devices numerical nuclear reactor numerical aircraft numerical earth system numerical engine high performance domain application software complex engineering project and critical equipment numerical simulation of ocean design of energy-efficient large fluid machineries drug discovery electromagnetic environment simulation ship design oil exploration digital media rendering high performance application software for research material science high energy physics astrophysics life science
22 Task 2: HPC application development HPC application software development establishing a national-level R&D center for HPC application software build up of a platform for HPC software development and optimization tools for performance/energy efficiency and pre- /post-processing build up software resource repository developing typical domain application software a joint effort involving national supercomputing centers, universities, and institutes
23 Basic research Task 3: HPC environment development models and architecture for computational services virtual data space Key technology mechanism and platform for the national HPC environment, providing technical support for service mode operation upgrading the national HPC environment (CNGrid)
24 Services Task 3: HPC environment development integrated business platform, e.g. complex product design HPC-enabled EDA platform application villages innovation and optimization of industrial products drug discovery SME computing and simulation platform platform for HPC education provide computing resources and services to undergraduate and graduate students
25 Projects supported The first call for proposal was issued in Feb., projects supported The second call was issued in Oct., 2016, 18 projects supported, mainly application software The third round of call was issued in Oct. 2017, the review process will begin soon.
26 Sugon exa-prototype: specification metrics prototype exascale ratio Computing Node peak (TF) No. of nodes No. silicon-unit System peak (PF) storage Memory (PB) Storage (PB) network Silicon-switch Power consum Dim. global net 2*1*3 8*8*6 4*8*2 Dim. local net 2*3*2 2*3*2 1 Power consumption Energy efficiency (GF/W) size W*D*H (m) 6*6*6 24*24*6 16 Total cabinets
27 Sugon exa-prototype: general design Computing sub-system home-grown X86 processor + DCU accelerator in 2019 CPU > 1TF, DCU > 15TF Network sub-system 400Gbps 6D-torus, 384 routers Storage sub-system Distributed storage architecture, extensible to EB Infrastructure sub-system Immersive phase-change cooling High voltage DC power supply Hierarchical 3D assembly Software sub-system Mature and complete libs and programming tools Light-weight virtualization and software-defined architecture
28 Sugon exa-prototype: hierarchical 3D structure 层次 每单元节点数 原型机单元数 E 级机单元数 Node pair Super node Silicon block Silicon cubic
29 Sugon exa-ptototype: Computing node Node:2 CPU and 2 DCU CPU and DCU interconnected by GOP high speed bus Memory bandwidth: 2667 Mbps, DDR4 Memory capacity 128G DDR4 Interconnect: 200Gbps fast Fabric U/R/LR DDR4 DIMMs 2X200G NIC U/R/LR DDR4 DIMMs DCU0 16x GOP*2 CPU0 XGKR*2 Pcle 16x XGKR*2 Pcle 16x CPU2 16x GOP*2 DCU2 16x GOP*2 SATA Pcle 4x 16x GOP*2 BIOS Midplane BIOS SATA Pcle 4x M.2 M.2 M.2 M.2 SATA/ Pcle 4x SATA/ Pcle 4x 16x GOP*2 XGKR*2 XGKR*2 16x GOP*2 DCU1 CPU1 Pcle 16x Pcle 16x CPU3 DCU3 16x GOP*2 16x GOP*2 BIOS BIOS U/R/LR DDR4 DIMMs AIU U/R/LR DDR4 DIMMs
30 Tianhe exa-prototype: flexible architecture Reconfigurable flexible architecture, meet the requirement of different applications Virtualized OS, provide a configurable computing environment Software-defined interconnect, guarantee bandwidth and fault isolation Hierarchical storage QoS guarantee technology, providing stable and independent storage bandwidth Dynamic optimization providing architecture-aware optimization application compiler runtime OS Computing node Computing sub-system IO storage sub-system
31 Tianhe exa-prototype: technical route performance Special purpose accelerator Many-core customized Energy efficiency Easy to use General purpose many-core is adopted by the prototype 31
32 Tianhe exa-prototype: technical features Flexible architecture to meet the requirement of different applications New generation many-core processor, pursuing balanced computing and memory access Optoelectronic integrated high speed interconnect, greatly improved performance and energy efficiency Fault-tolerance based on new storage medium Accurate heat dissipation, tradeoff between the manufacture cost and the operational cost
33 Tianhe exa-prototype: interconnect High-radix router for low power consumption, low cost and high desity Exascale communication need: single node > 400Gbps Chip power budget <200W, at most 12 ports of 400 Gbps Co-design of ultra short distance Serdes PHY, PHY coding, and link layer Optoelectronic integration for interconnect 33
34 Sunway exa-prototype: hardware system System composed of computing, interconnect, storage, power supply and cooling New generation many-core based system,512 nodes,performance >4PFlops Self-developed network chip, fat-tree interconnect, point to point bandwidth > 200Gbps Storage subsystem based on Shenwei storage server Self-developed high voltage (300V) DC power supply High efficient water-cooling, enhanced heat transfer copper cold plate 二级胖树互连结构 直流供电系统 新一代众核处理器 强化换热冷板组装节点 运算机仓 水冷机组
35 Sunway exa-prototype: computing node DDR4 Connection to the interconnect 2 X 25GbpsX4 Point to point one-way bandwidth 200Gbps Peak performance >8TFlops memory > 64GB DDR3 DDR4 DDR3 高速计算网 网络接口 核组0 时钟管理 核组1 处理器管理 PCI-E 电源管理 以太网 节点监测 核组2 BM C 核组3 以太管理网 网络接口 DDR3 DDR4 DDR3 DDR4
36 Basic software for home-grown manycore processor parallel OS high performance storage management system parallel compiler parallel program development environment High efficient compiler for heterogeneous many-core SIMD auto-vectorization High performance basic math libs Integrated multi-domain OS for heterogeneous many-core Dynamic storage management Supporting MPI-1 MPI-2 MPI-3 OpenMP3.0, compatible OpenACC2.0 Debugger for heterogeneous manycore Sunway exa-prototype: software system
37 Sunway exa-prototype: demo applications Porting applications on TaihuLight, performance optimization is being conducted Floating platform design seismic Aircraft design Ocean model
38 Sunway exa-prototype: applications 10-Million core applications on TaihuLight 2016 Fully Implicit Solver for Atmospheric Dynamics Surface Wave Modeling Phase Field Simulations of Coarsening Dynamics Atomistic Simulation of Silicon Nanowires Run-away Electron Trajectory Simulation Genome Functional Annotation and Homeotic Gene Building Spacecraft CFD Numerical Simulation 2017 Extreme-scale Graph Processing Framework Simulation of Planetary Rings Simulations of Quantum Spin Liquid States via PEPS++ Molecular Dynamics Simulation of Condensed Covalent Materials cryo-em Macromolecule Structure Determination Redesigning CAM-SE Nonlinear Earthquake Simulation
39 Issues in exascale system development
40 Major Challenges to exa-scale systems Power consumption Performance obtained by applications Programmability Resilience How to make tradeoffs between performance, power consumption, and programmability? How to achieve continuous no-stop operation? How to adapt to a wide range of applications with reasonable efficiency?
41 Architecture Novel architectures beyond the current heterogeneous accelerated/manycore-based expected Co-processor or partitioned heterogeneous architecture? Low utilization of the co-processor in some applications, using CPU only Bottleneck in moving data between CPU and co-processor Application-aware architecture on-chip integration of special purpose units (idea from Prof. Andrew Chien) using the right tool to do the right things dynamic reconfigurable? how to program?
42 Memory system Pursuing large capacity, low latency, high bandwidth Increase capacity and lower power consumption by using DRAM/NVM together Data placement issue Improving bandwidth and latency by using the 3D stacking technology Reduce the data move by placing the data closer to processing HBM/HMC near processor On-chip DRAM Simple functions in memory Reduce data copy cost by using unified memory space in heterogeneous architecture
43 Pursuing low latency, high bandwidth and low energy consumption Adopt new technologies silicon photonics communication between components optical interconnect / communication miniature optical devices High scalability adapting to exascale system interconnect requirement Connecting 10,000+ nodes Low-hop, low-latency topology Reliable and intelligent routing Interconnect
44 Programming the heterogeneous systems Addressing the issues in programming the heterogeneous parallel systems efficient expression of the parallelism, dependence, data sharing, execution semantics problem decomposition appropriate for heterogeneous systems Improving programming by means of a holistic approach new programming models programming languages extension and compiler parallel debugging runtime support and optimization architectural support
45 Full-chain innovation mathematical methods computer algorithms algorithm implementation and optimization A good mathematical method is often more effective than hardware improvement and algorithm optimization Architecture-aware algorithm implementation and optimization is necessary for heterogeneous systems Domain-specific libraries for improving software productivity and performance Computational models and algorithms
46 Resilience Resilience is one of the key issues of the exa-scale system Large scale of the system 50K to 100K nodes Huge amount of components Very short MTBF Long time non-stop operation required for solving large scale problems Reliability measures at different levels required, including device, node, and system levels Software / hardware coordination is necessary fast context saving and recovery for checkpointing in case of short MTBF fault-tolerance at the algorithm and application software level
47 Importance of the tools Development and optimization of large scale parallel software require scalable tools Particularly important for systems implemented with home-grown processors current commercial and research tools do not support Three kinds of default tools required Parallel debugger for correctness Performance tuner for performance Energy optimizer for energy efficiency
48 Urgent need for eco-system The eco-system for exa-scale system based on home-grown processors is in a urgent need languages, compilers, OS, runtime tools application development support application software Need to attract the hardware manufacturers and the third party software developers product family instead of a single machine Collaboration between industry, academia and end-users required
49 Thank you!
Overview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationSunway TaihuLight: The system and applications. Zhao Liu Director of Application Support Department National Supercomputing Center in Wuxi
Sunway TaihuLight: The system and applications Zhao Liu Director of Application Support Department National Supercomputing Center in Wuxi Outline Sunway Machine Applications and Programming Challenges
More informationHPC and Big Data: Updates about China. Haohuan FU August 29 th, 2017
HPC and Big Data: Updates about China Haohuan FU August 29 th, 2017 1 Outline HPC and Big Data Projects in China Recent Efforts on Tianhe-2 Recent Efforts on Sunway TaihuLight 2 MOST HPC Projects 2016
More informationIntroduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration
Exascale Applications and Software Conference 21st 23rd April 2015, Edinburgh, UK Introduction to National Supercomputing Centre in Guangzhou and Opportunities for International Collaboration Xue-Feng
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationAim High. Intel Technical Update Teratec 07 Symposium. June 20, Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group
Aim High Intel Technical Update Teratec 07 Symposium June 20, 2007 Stephen R. Wheat, Ph.D. Director, HPC Digital Enterprise Group Risk Factors Today s s presentations contain forward-looking statements.
More informationThe Road from Peta to ExaFlop
The Road from Peta to ExaFlop Andreas Bechtolsheim June 23, 2009 HPC Driving the Computer Business Server Unit Mix (IDC 2008) Enterprise HPC Web 100 75 50 25 0 2003 2008 2013 HPC grew from 13% of units
More informationHPC Technology Trends
HPC Technology Trends High Performance Embedded Computing Conference September 18, 2007 David S Scott, Ph.D. Petascale Product Line Architect Digital Enterprise Group Risk Factors Today s s presentations
More informationThe State and Opportunities of HPC Applications in China. Ruibo Wang National University of Defense Technology
The State and Opportunities of HPC Applications in China Ruibo Wang National University of Defense Technology Outline Brief introduction to the Sites Applications Fusion Development of HPC, Cloud & Big
More informationBrand-New Vector Supercomputer
Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth
More informationTianhe-2, the world s fastest supercomputer. Shaohua Wu Senior HPC application development engineer
Tianhe-2, the world s fastest supercomputer Shaohua Wu Senior HPC application development engineer Inspur Inspur revenue 5.8 2010-2013 6.4 2011 2012 Unit: billion$ 8.8 2013 21% Staff: 14, 000+ 12% 10%
More informationTitan - Early Experience with the Titan System at Oak Ridge National Laboratory
Office of Science Titan - Early Experience with the Titan System at Oak Ridge National Laboratory Buddy Bland Project Director Oak Ridge Leadership Computing Facility November 13, 2012 ORNL s Titan Hybrid
More informationThe IBM Blue Gene/Q: Application performance, scalability and optimisation
The IBM Blue Gene/Q: Application performance, scalability and optimisation Mike Ashworth, Andrew Porter Scientific Computing Department & STFC Hartree Centre Manish Modani IBM STFC Daresbury Laboratory,
More informationPART I - Fundamentals of Parallel Computing
PART I - Fundamentals of Parallel Computing Objectives What is scientific computing? The need for more computing power The need for parallel computing and parallel programs 1 What is scientific computing?
More informationHPC projects. Grischa Bolls
HPC projects Grischa Bolls Outline Why projects? 7th Framework Programme Infrastructure stack IDataCool, CoolMuc Mont-Blanc Poject Deep Project Exa2Green Project 2 Why projects? Pave the way for exascale
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationIntroduction of Fujitsu s next-generation supercomputer
Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of
More informationInterconnect Your Future Enabling the Best Datacenter Return on Investment. TOP500 Supercomputers, November 2017
Interconnect Your Future Enabling the Best Datacenter Return on Investment TOP500 Supercomputers, November 2017 InfiniBand Accelerates Majority of New Systems on TOP500 InfiniBand connects 77% of new HPC
More informationCommunication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems.
Cluster Networks Introduction Communication has significant impact on application performance. Interconnection networks therefore have a vital role in cluster systems. As usual, the driver is performance
More informationBlueGene/L. Computer Science, University of Warwick. Source: IBM
BlueGene/L Source: IBM 1 BlueGene/L networking BlueGene system employs various network types. Central is the torus interconnection network: 3D torus with wrap-around. Each node connects to six neighbours
More informationCOMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES
COMPUTING ELEMENT EVOLUTION AND ITS IMPACT ON SIMULATION CODES P(ND) 2-2 2014 Guillaume Colin de Verdière OCTOBER 14TH, 2014 P(ND)^2-2 PAGE 1 CEA, DAM, DIF, F-91297 Arpajon, France October 14th, 2014 Abstract:
More informationCUDA. Matthew Joyner, Jeremy Williams
CUDA Matthew Joyner, Jeremy Williams Agenda What is CUDA? CUDA GPU Architecture CPU/GPU Communication Coding in CUDA Use cases of CUDA Comparison to OpenCL What is CUDA? What is CUDA? CUDA is a parallel
More informationIntroduction to Parallel and Distributed Computing. Linh B. Ngo CPSC 3620
Introduction to Parallel and Distributed Computing Linh B. Ngo CPSC 3620 Overview: What is Parallel Computing To be run using multiple processors A problem is broken into discrete parts that can be solved
More informationIntel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins
Intel Many Integrated Core (MIC) Matt Kelly & Ryan Rawlins Outline History & Motivation Architecture Core architecture Network Topology Memory hierarchy Brief comparison to GPU & Tilera Programming Applications
More informationCurrent Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN
Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages
More informationBuilding NVLink for Developers
Building NVLink for Developers Unleashing programmatic, architectural and performance capabilities for accelerated computing Why NVLink TM? Simpler, Better and Faster Simplified Programming No specialized
More informationMathematical computations with GPUs
Master Educational Program Information technology in applications Mathematical computations with GPUs Introduction Alexey A. Romanenko arom@ccfit.nsu.ru Novosibirsk State University How to.. Process terabytes
More informationInterconnect Your Future
Interconnect Your Future Paving the Path to Exascale November 2017 Mellanox Accelerates Leading HPC and AI Systems Summit CORAL System Sierra CORAL System Fastest Supercomputer in Japan Fastest Supercomputer
More informationSupercomputing and Mass Market Desktops
Supercomputing and Mass Market Desktops John Manferdelli Microsoft Corporation This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationAPENet: LQCD clusters a la APE
Overview Hardware/Software Benchmarks Conclusions APENet: LQCD clusters a la APE Concept, Development and Use Roberto Ammendola Istituto Nazionale di Fisica Nucleare, Sezione Roma Tor Vergata Centro Ricerce
More informationCPU-GPU Heterogeneous Computing
CPU-GPU Heterogeneous Computing Advanced Seminar "Computer Engineering Winter-Term 2015/16 Steffen Lammel 1 Content Introduction Motivation Characteristics of CPUs and GPUs Heterogeneous Computing Systems
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationHETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA
HETEROGENEOUS HPC, ARCHITECTURAL OPTIMIZATION, AND NVLINK STEVE OBERLIN CTO, TESLA ACCELERATED COMPUTING NVIDIA STATE OF THE ART 2012 18,688 Tesla K20X GPUs 27 PetaFLOPS FLAGSHIP SCIENTIFIC APPLICATIONS
More informationThe Cray Rainier System: Integrated Scalar/Vector Computing
THE SUPERCOMPUTER COMPANY The Cray Rainier System: Integrated Scalar/Vector Computing Per Nyberg 11 th ECMWF Workshop on HPC in Meteorology Topics Current Product Overview Cray Technology Strengths Rainier
More informationNERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber
NERSC Site Update National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory Richard Gerber NERSC Senior Science Advisor High Performance Computing Department Head Cori
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationHPC with GPU and its applications from Inspur. Haibo Xie, Ph.D
HPC with GPU and its applications from Inspur Haibo Xie, Ph.D xiehb@inspur.com 2 Agenda I. HPC with GPU II. YITIAN solution and application 3 New Moore s Law 4 HPC? HPC stands for High Heterogeneous Performance
More informationThe DEEP (and DEEP-ER) projects
The DEEP (and DEEP-ER) projects Estela Suarez - Jülich Supercomputing Centre BDEC for Europe Workshop Barcelona, 28.01.2015 The research leading to these results has received funding from the European
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationSystem Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited
System Packaging Solution for Future High Performance Computing May 31, 2018 Shunichi Kikuchi Fujitsu Limited 2018 IEEE 68th Electronic Components and Technology Conference San Diego, California May 29
More informationThe Architecture and the Application Performance of the Earth Simulator
The Architecture and the Application Performance of the Earth Simulator Ken ichi Itakura (JAMSTEC) http://www.jamstec.go.jp 15 Dec., 2011 ICTS-TIFR Discussion Meeting-2011 1 Location of Earth Simulator
More informationINSPUR and HPC Innovation
INSPUR and HPC Innovation Dong Qi (Forrest) Product manager Inspur dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community
More informationIntroduction to the K computer
Introduction to the K computer Fumiyoshi Shoji Deputy Director Operations and Computer Technologies Div. Advanced Institute for Computational Science RIKEN Outline ü Overview of the K
More informationPetascale Computing Research Challenges
Petascale Computing Research Challenges - A Manycore Perspective Stephen Pawlowski Intel Senior Fellow GM, Architecture & Planning CTO, Digital Enterprise Group Yesterday, Today and Tomorrow in HPC ENIAC
More informationHPC Issues for DFT Calculations. Adrian Jackson EPCC
HC Issues for DFT Calculations Adrian Jackson ECC Scientific Simulation Simulation fast becoming 4 th pillar of science Observation, Theory, Experimentation, Simulation Explore universe through simulation
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationPath to Exascale? Intel in Research and HPC 2012
Path to Exascale? Intel in Research and HPC 2012 Intel s Investment in Manufacturing New Capacity for 14nm and Beyond D1X Oregon Development Fab Fab 42 Arizona High Volume Fab 22nm Fab Upgrades D1D Oregon
More informationFujitsu s Technologies to the K Computer
Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The
More informationGreen Supercomputing
Green Supercomputing On the Energy Consumption of Modern E-Science Prof. Dr. Thomas Ludwig German Climate Computing Centre Hamburg, Germany ludwig@dkrz.de Outline DKRZ 2013 and Climate Science The Exascale
More informationAggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments
Aggregation of Real-Time System Monitoring Data for Analyzing Large-Scale Parallel and Distributed Computing Environments Swen Böhm 1,2, Christian Engelmann 2, and Stephen L. Scott 2 1 Department of Computer
More informationWhite paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation
White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview
More informationInauguration Cartesius June 14, 2013
Inauguration Cartesius June 14, 2013 Hardware is Easy...but what about software/applications/implementation/? Dr. Peter Michielse Deputy Director 1 Agenda History Cartesius Hardware path to exascale: the
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson adrianj@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past, computers
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationThe Red Storm System: Architecture, System Update and Performance Analysis
The Red Storm System: Architecture, System Update and Performance Analysis Douglas Doerfler, Jim Tomkins Sandia National Laboratories Center for Computation, Computers, Information and Mathematics LACSI
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationChina's supercomputer surprises U.S. experts
China's supercomputer surprises U.S. experts John Markoff Reproduced from THE HINDU, October 31, 2011 Fast forward: A journalist shoots video footage of the data storage system of the Sunway Bluelight
More informationChallenges in High Performance Computing. William Gropp
Challenges in High Performance Computing William Gropp www.cs.illinois.edu/~wgropp 2 What is HPC? High Performance Computing is the use of computing to solve challenging problems that require significant
More information创新释放高性能计算潜力 林俊 : 华为服务器领域首席架构师
创新释放高性能计算潜力 林俊 : 华为服务器领域首席架构师 Market Trends 2 2 Requirement for Compute Security Big Data Cloud Mobility Internet of Things Industry 4.0 Intelligent City 2020 Millions of MIPS Opportunity for Innovation
More informationAdvances of parallel computing. Kirill Bogachev May 2016
Advances of parallel computing Kirill Bogachev May 2016 Demands in Simulations Field development relies more and more on static and dynamic modeling of the reservoirs that has come a long way from being
More informationIntroduction CPS343. Spring Parallel and High Performance Computing. CPS343 (Parallel and HPC) Introduction Spring / 29
Introduction CPS343 Parallel and High Performance Computing Spring 2018 CPS343 (Parallel and HPC) Introduction Spring 2018 1 / 29 Outline 1 Preface Course Details Course Requirements 2 Background Definitions
More informationThe Mont-Blanc approach towards Exascale
http://www.montblanc-project.eu The Mont-Blanc approach towards Exascale Alex Ramirez Barcelona Supercomputing Center Disclaimer: Not only I speak for myself... All references to unavailable products are
More informationRace to Exascale: Opportunities and Challenges. Avinash Sodani, Ph.D. Chief Architect MIC Processor Intel Corporation
Race to Exascale: Opportunities and Challenges Avinash Sodani, Ph.D. Chief Architect MIC Processor Intel Corporation Exascale Goal: 1-ExaFlops (10 18 ) within 20 MW by 2018 1 ZFlops 100 EFlops 10 EFlops
More informationSearch for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构
Search for Optimal Network Topologies for Supercomputers 寻找超级计算机优化的网络拓扑结构 GUO, Meng 郭猛 guomeng@sdas.org Shandong Computer Science Center (National Supercomputer Center in Jinan) 山东省计算中心 ( 国家超级计算济南中心 )
More informationThe Future of High Performance Interconnects
The Future of High Performance Interconnects Ashrut Ambastha HPC Advisory Council Perth, Australia :: August 2017 When Algorithms Go Rogue 2017 Mellanox Technologies 2 When Algorithms Go Rogue 2017 Mellanox
More informationStockholm Brain Institute Blue Gene/L
Stockholm Brain Institute Blue Gene/L 1 Stockholm Brain Institute Blue Gene/L 2 IBM Systems & Technology Group and IBM Research IBM Blue Gene /P - An Overview of a Petaflop Capable System Carl G. Tengwall
More informationDynamical Exascale Entry Platform
DEEP Dynamical Exascale Entry Platform 2 nd IS-ENES Workshop on High performance computing for climate models 30.01.2013, Toulouse, France Estela Suarez The research leading to these results has received
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationHigh performance Computing and O&G Challenges
High performance Computing and O&G Challenges 2 Seismic exploration challenges High Performance Computing and O&G challenges Worldwide Context Seismic,sub-surface imaging Computing Power needs Accelerating
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationTSUBAME-KFC : Ultra Green Supercomputing Testbed
TSUBAME-KFC : Ultra Green Supercomputing Testbed Toshio Endo,Akira Nukada, Satoshi Matsuoka TSUBAME-KFC is developed by GSIC, Tokyo Institute of Technology NEC, NVIDIA, Green Revolution Cooling, SUPERMICRO,
More informationTowards Exascale Computing with the Atmospheric Model NUMA
Towards Exascale Computing with the Atmospheric Model NUMA Andreas Müller, Daniel S. Abdi, Michal Kopera, Lucas Wilcox, Francis X. Giraldo Department of Applied Mathematics Naval Postgraduate School, Monterey
More informationAccelerating High Performance Computing.
Accelerating High Performance Computing http://www.nvidia.com/tesla Computing The 3 rd Pillar of Science Drug Design Molecular Dynamics Seismic Imaging Reverse Time Migration Automotive Design Computational
More informationIntel Many Integrated Core (MIC) Architecture
Intel Many Integrated Core (MIC) Architecture Karl Solchenbach Director European Exascale Labs BMW2011, November 3, 2011 1 Notice and Disclaimers Notice: This document contains information on products
More informationThe Stampede is Coming: A New Petascale Resource for the Open Science Community
The Stampede is Coming: A New Petascale Resource for the Open Science Community Jay Boisseau Texas Advanced Computing Center boisseau@tacc.utexas.edu Stampede: Solicitation US National Science Foundation
More informationGame-changing Extreme GPU computing with The Dell PowerEdge C4130
Game-changing Extreme GPU computing with The Dell PowerEdge C4130 A Dell Technical White Paper This white paper describes the system architecture and performance characterization of the PowerEdge C4130.
More informationThe Earth Simulator Current Status
The Earth Simulator Current Status SC13. 2013 Ken ichi Itakura (Earth Simulator Center, JAMSTEC) http://www.jamstec.go.jp 2013 SC13 NEC BOOTH PRESENTATION 1 JAMSTEC Organization Japan Agency for Marine-Earth
More informationManaging HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory
Managing HPC Active Archive Storage with HPSS RAIT at Oak Ridge National Laboratory Quinn Mitchell HPC UNIX/LINUX Storage Systems ORNL is managed by UT-Battelle for the US Department of Energy U.S. Department
More informationHPC Algorithms and Applications
HPC Algorithms and Applications Intro Michael Bader Winter 2015/2016 Intro, Winter 2015/2016 1 Part I Scientific Computing and Numerical Simulation Intro, Winter 2015/2016 2 The Simulation Pipeline phenomenon,
More informationIt s a Multicore World. John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist
It s a Multicore World John Urbanic Pittsburgh Supercomputing Center Parallel Computing Scientist Waiting for Moore s Law to save your serial code started getting bleak in 2004 Source: published SPECInt
More informationFujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited
Fujitsu HPC Roadmap Beyond Petascale Computing Toshiyuki Shimizu Fujitsu Limited Outline Mission and HPC product portfolio K computer*, Fujitsu PRIMEHPC, and the future K computer and PRIMEHPC FX10 Post-FX10,
More informationIntroduction of Oakforest-PACS
Introduction of Oakforest-PACS Hiroshi Nakamura Director of Information Technology Center The Univ. of Tokyo (Director of JCAHPC) Outline Supercomputer deployment plan in Japan What is JCAHPC? Oakforest-PACS
More informationDistributed Dense Linear Algebra on Heterogeneous Architectures. George Bosilca
Distributed Dense Linear Algebra on Heterogeneous Architectures George Bosilca bosilca@eecs.utk.edu Centraro, Italy June 2010 Factors that Necessitate to Redesign of Our Software» Steepness of the ascent
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationHPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017
HPC Innovation Lab Update Dell EMC HPC Community Meeting 3/28/2017 Dell EMC HPC Innovation Lab charter Design, develop and integrate Heading HPC systems Lorem ipsum Flexible reference dolor sit amet, architectures
More informationTimothy Lanfear, NVIDIA HPC
GPU COMPUTING AND THE Timothy Lanfear, NVIDIA FUTURE OF HPC Exascale Computing will Enable Transformational Science Results First-principles simulation of combustion for new high-efficiency, lowemision
More informationAtos announces the Bull sequana X1000 the first exascale-class supercomputer. Jakub Venc
Atos announces the Bull sequana X1000 the first exascale-class supercomputer Jakub Venc The world is changing The world is changing Digital simulation will be the key contributor to overcome 21 st century
More informationIn-Network Computing. Paving the Road to Exascale. 5th Annual MVAPICH User Group (MUG) Meeting, August 2017
In-Network Computing Paving the Road to Exascale 5th Annual MVAPICH User Group (MUG) Meeting, August 2017 Exponential Data Growth The Need for Intelligent and Faster Interconnect CPU-Centric (Onload) Data-Centric
More informationINTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER. Adrian
INTRODUCTION TO THE ARCHER KNIGHTS LANDING CLUSTER Adrian Jackson a.jackson@epcc.ed.ac.uk @adrianjhpc Processors The power used by a CPU core is proportional to Clock Frequency x Voltage 2 In the past,
More informationC-DAC HPC Trends & Activities in India. Abhishek Das Scientist & Team Leader HPC Solutions Group C-DAC Ministry of Communications & IT Govt of India
C-DAC HPC Trends & Activities in India Abhishek Das Scientist & Team Leader HPC Solutions Group C-DAC Ministry of Communications & IT Govt of India Presentation Outline A brief profile of C-DAC, India
More informationCray XD1 Supercomputer Release 1.3 CRAY XD1 DATASHEET
CRAY XD1 DATASHEET Cray XD1 Supercomputer Release 1.3 Purpose-built for HPC delivers exceptional application performance Affordable power designed for a broad range of HPC workloads and budgets Linux,
More informationHigh Performance Computing with Fujitsu
High Performance Computing with Fujitsu Ivo Doležel 0 2017 FUJITSU FUJITSU Software HPC Cluster Suite A complete HPC software stack solution HPC cluster general characteristics HPC clusters consist primarily
More informationHigh Performance Computing with Accelerators
High Performance Computing with Accelerators Volodymyr Kindratenko Innovative Systems Laboratory @ NCSA Institute for Advanced Computing Applications and Technologies (IACAT) National Center for Supercomputing
More informationINSPUR and HPC Innovation. Dong Qi (Forrest) Oversea PM
INSPUR and HPC Innovation Dong Qi (Forrest) Oversea PM dongqi@inspur.com Contents 1 2 3 4 5 Inspur introduction HPC Challenge and Inspur HPC strategy HPC cases Inspur contribution to HPC community Inspur
More informationPractical Scientific Computing
Practical Scientific Computing Performance-optimized Programming Preliminary discussion: July 11, 2008 Dr. Ralf-Peter Mundani, mundani@tum.de Dipl.-Ing. Ioan Lucian Muntean, muntean@in.tum.de MSc. Csaba
More informationCOMP 633 Parallel Computing.
COMP 633 Parallel Computing http://www.cs.unc.edu/~prins/classes/633/ Parallel computing What is it? multiple processors cooperating to solve a single problem hopefully faster than a single processor!
More informationPORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune
PORTING CP2K TO THE INTEL XEON PHI ARCHER Technical Forum, Wed 30 th July Iain Bethune (ibethune@epcc.ed.ac.uk) Outline Xeon Phi Overview Porting CP2K to Xeon Phi Performance Results Lessons Learned Further
More information