Post-K Supercomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA
|
|
- Victor Thornton
- 5 years ago
- Views:
Transcription
1 Post-K Superomputer with Fujitsu's Original CPU, A64FX Powered by Arm ISA Toshiyuki Shimizu Nov. 15th, 2018 Post-K is under development, information in these slides is subjet to hange without notie 0
2 Agenda Bakground K omputer and Post-K Post-K goals and approahes Fujitsu new Arm CPU A64FX Post-K system System software Configuration Performane disussion Summary and development status 1
3 Agenda Bakground K omputer and Post-K Post-K goals and approahes Fujitsu new Arm CPU A64FX Post-K system System software Configuration Performane disussion Summary and development status 2
4 K omputer, PRIMEHPC, and Post-K FUJITSU PRIMEHPC FX10 1.8x CPU perf. Easier installation Japan s National Projets Development HPCI strategi apps program App. review FS projets PRIMEHPC FX100 4x(DP) / 8x(SP) CPU & Tofu2 High-density pkg & lower energy Tehnial Computing Suite (TCS) Handles millions of parallel jobs OS: Lower OS jitter w/ FEFS: super salable file system assistant ore MPI: Ultra salable olletive ommuniation libraries Operation of K omputer Post-K Post-K omputer development K omputer, PRIMEHPC FX Many appliations are running and being developed for siene and various industries System software TCS supports hardware with newly developed tehnologies Post-K RIKEN and Fujitsu are working together for Post-K OS is running and design verifiation is proeeding as sheduled 3
5 Post-K goals and approahes Post-K goals High appliation performane and good power effiieny Good usability and better aessibility for users Keeping appliation ompatibility while advaning from predeessors Our approahes Develops high performane and salable, ustom CPU ores Performane Wider SIMD & mathematial a. primitives, high memory BW Salability Salable interonnet Tofu Power effiieny The best devie teh, power ontrol funtions, optimal resoures Adopts Arm ISA, binary ompatibility Maintains performane balane and supports advaned features 4
6 Agenda Bakground K omputer and Post-K Post-K goals and approahes Fujitsu new Arm CPU A64FX Post-K system System software Configuration Performane disussion Summary and development status 5
7 Fujitsu new Arm CPU: A64FX A64FX approah Spes and tehnologies Tofu interonnet D Power management A64FX summary 6
8 A64FX: Approah High performane Arm CPU Develops original CPU ore supporting Arm SVE (Salable Vetor Extension) Targeting high performane servers Floating point alulations High memory bandwidth Low power onsumption Salable performane and onfiguration Extension for emerging appliations Half preision (FP16) & INT16/8 partial dot produt support High BW/Cal. is the key for Real apps. 7
9 A64FX: Spes & tehnologies CPU generations and key parameters CPU SPARC64 VIIIfx SPARC64 IXfx SPARC64 XIfx A64FX 1 st system w/ CPU K FX10 FX100 Post-K Si teh. (nm) Core perf. (GFLOPS) SVE 57~ # of ores Chip perf. (TFLOPS) ~ Memory BW (GB/s) HBM 1024 B/F (Bytes/FLOP) Combination of tehnologies SVE inreases ore performane is salable arhiteture to inrease # of ores HBM enables high bandwidth 8
10 A64FX tehnologies: Core performane High al. throughput of Fujitsu original CPU ore w/ SVE 512-bit wide SIMD x 2 pipelines and new integer funtions (GOPS) 500 Core peak performane INT8 partial dot produt >460 >230 >57 > bit 32-bit 16-bit 8-bit Multiply and add (Element size) INT8 partial dot produt 8bit 8bit 8bit 8bit A0 A1 A2 A3 X X X X B0 B1 B2 B3 C 32bit 9
11 A64FX tehnologies: Salable arhiteture Core Memory Group () 12 ompute ores and an assistant ore for OS daemon, I/O, et. Shared L2 ahe Dediated memory ontroller onfiguration Four s maintain ahe oherene w/ on-hip diretory Threads binding within a allows linear speed of up to 48 ompute ores A64FX pakage onfiguration ore ore ore ore ore ore ore ore ore ore ore ore ore Tofu ontroller PCIe ontroller X-Bar onnetion HBM2 HBM2 L2 ahe 8MiB 16-way Memory ontroller HBM2 Network on hip HBM2 HBM2 Network on hip 10
12 A64FX: L1D ahe unompromised BW 128B/yle sustained BW even for unaligned SIMD load Read port0 L1D$ Read data0 64B/yle Read port1 Read data1 64B/yle Combined Gather doubles gather (indiret) load s data throughput, when target elements are within a 128-byte aligned blok for a pair of two regs, even & odd Mem. Regs 128B flow-1 flow flow-4 flow-3 8B Maximizes BW to 32 bytes/y. 12
13 Tofu Network Router 2 lanes 10 ports A64FX: Tofu interonnet D Integrated w/ rih resoures Inreased TNIs ahieves higher injetion BW & flexible omm. patterns Inreased barrier resoures allow flexible olletive omm. algorithms Memory bypassing ahieves low lateny Diret desriptor & ahe injetion Data rate Link bandwidth TofuD spe Gbps 6.8 GB/s HBM2 NOC HBM2 TNI0 TNI1 TNI2 PCle A64FX Injetion bandwidth Put throughput 40.8 GB/s Measured 6.35 GB/s PingPong lateny 0.49~0.54 µs HBM2 HBM2 TNI3 TNI4 TNI5 TofuD 13
14 A64FX: Power monitor and analyzer Energy monitor (per hip) Node power via Power API*1 (~mse) Averaged power of a node, (ores, an L2 ahe, a memory) et. Energy analyzer (per ore) Power profiler via PAPI*2 (~nse) Fine grained power analysis of a ore, an L2 ahe and a memory *1: Sandia National Laboratory *2: Performane Appliation Programming Interfae A64FX #0 Core L2 ahe Memory Core ativity Energy Analyzer Power al. Core L2 Cahe Memory L2 ahe ativity Power al. Memory ativity Power al. #1 #2 #3 Energy Monitor Node 12 Cores L2 ahe HBM2 Assistant ore Tofu PCIe Power al. for Tofu Power al. for PCIe 14
15 A64FX: Power Knobs to redue power onsumption Power knob limits units ativity via user APIs Performane/W would be optimized by utilizing Power knobs, Energy monitor & analyzer Feth Issue Dispath Reg-read Exeute Cahe and Memory Commit L1 I$ Branh Preditor Deode & Issue RSA RSE0 RSE1 PGPR PPR EAGA EXC EAGB EXD EXA EXB PRX Feth Port Store Port Write Buffer L1D$ CSE PC Control Registers Frequeny redution EXA pipeline only RSBR PFPR FLA FLB FLA pipeline only L2$ Deode width: 4 to 2 Tofu ontroller HBM2 Controller HBM2 PCI Controller 52 ores HBM2 B/W adjust (units of 10%) 15
16 A64FX: Summary Arm SVE, high performane and effiieny DP performane >2.7 TFLOPS, Memory BW 1024 GB/s, Triad 12x ompute ores 1x assistant ore HBM2 HBM2 PCle Controller C C N O C Tofu Interfae C C HBM2 HBM2 :Core Memory Group NOC:Network on Chip A64FX ISA (Base, extension) Armv8.2-A, SVE Proess tehnology 7 nm Peak DP performane >2.7 TFLOPS SIMD width 512-bit # of ompute ores 48 Memory apaity 32 GiB (HBM2 x4) Memory peak bandwidth 1024 GB/s PCIe Gen3 16 lanes High speed interonnet TofuD integrated 16
17 Agenda Bakground K omputer and Post-K Post-K goals and approahes Fujitsu new Arm CPU A64FX Post-K system System software Configuration Performane disussion Summary and development status 17
18 Software development: System software RIKEN and Fujitsu are developing software staks for Post-K Fujitsu Tehnial Computing Suite / RIKEN developing system software Management software System management for high availability & power saving operation Job management for higher system utilization & power effiieny Post-K appliations File system FEFS Lustre-based distributed file system LLIO NVM-based file I/O aelerator Linux OS / MKernel (Lightweight kernel) Post-K system hardware Programming environment XalableMP MPI (Open MPI, MPICH) OpenMP, COARRAY, Math. libs. Compilers (C, C++, Fortran) Debugging and tuning tools Post-K Under development w/ RIKEN 18
19 FEFS vs Lustre, LLIO vs Burst buffer Post-K supports superior performane file system, FEFS + LLIO FEFS Lustre Storage network Tofu, InfiniBand, Ether, OPA InfiniBand, Ether, OPA Injetion BW per node 40.8 GB/s 12.5 GB/s (IB EDR) MDS salability Yes Yes (ver. 2.4~) Interoperability x86, Arm (A64FX) x86, Power, Arm? FEFS + LLIO* Lustre + Burst buffer** Loal temporary file a. Yes No Cahe file a. Yes Yes Shared file within a job. Yes Yes Expliit funtion all Not required (by mount point) Required (Soure modifiation) *Lightweight Layered IO-Aelerator **DDN IME 19
20 LLIO (Lightweight Layered IO-aelerator) overview Maximizing appliation file I/O performane Easy aess to User Data: File Cahe of Global File System Higher Data Aess Performane: Temporary Loal FS (in a proess) Higher Data Sharing Performane: Temporary Shared FS (among proesses) Job A Job B Assistant Cores File I/O Node App. Node App. Node App. Node App. Node App. Compute Cores 1 st level file1 Loal file2 Cahe File3 Shared Temporary Cahe Loal file systems 2 nd level file4 Global file system (Lustre based) 20
21 Post-K system onfiguration Salable design CPU CMU BoB Shelf Rak Unit # of nodes Desription CPU 1 Single soket node with HBM2 & Tofu interonnet D CMU 2 CPU Memory Unit: 2x CPU BoB 16 Bunh of Blades: 8x CMU Shelf 48 3x BoB Rak 384 8x Shelf 23
22 CMU: CPU Memory Unit A64FX CPU x2 QSFP28 x3 for Ative Optial Cables Single-side blind mate of signals & water ~100% diret water ooling Water Water AOC AOC QSFP28 (Z) QSFP28 (Y) QSFP28 (X) Eletrial signals 24
23 Agenda Bakground K omputer and Post-K Post-K goals and approahes Fujitsu new Arm CPU A64FX Post-K system System software Configuration Performane disussion Summary and development status 25
24 Performane inrease normalized by SPARC64 XIfx Preliminary performane evaluation results Over 2.5x faster in HPC & AI benhmarks than SPARC64 XIfx Arm SVE, Fujitsu s miroarhiteture, and high bandwidth A64FX hip performane measurements & arhitetural ontributions HPC Throughput tests Appliation kernels AI 9.4x Memory BW 512-bit SIMD Combined gather L1$ BW L2$ BW INT8 partial dot produt 2.5TF 840 GB/s 3.0x 2.8x 3.4x 2.5x Baseline: SPARC64 XIfx DGEMM STREAM Triad Fluid dynamis Atomosphere Seismi wave propagation Convolution FP32 Convolution low preision 26
25 Agenda Bakground K omputer and Post-K Post-K goals and approahes Fujitsu new Arm CPU A64FX Post-K system System software Configuration Performane disussion Summary and development status 27
26 Summary of Post-K High appliation performane and good power effiieny High memory bandwidth & wider SIMD Arm SVE support Salable vetor extension, state of art Arm instrution set arhiteture Binary ompatibility advanes and expands the power of eosystem by inorporating HPC tehnologies and appliations Optimizing for new superomputer standards Low power onsumption in many aspets Being fousing on the existing appliations and emerging appliations Post-K inheriting the strong and proven miroarhiteture with HPC system software will be more powerful with Arm and its eosystem 28
27 System development status: Post-K Fujitsu is urrently developing Post-K, suessor to K omputer, with RIKEN OS running and design verifiation is underway as sheduled RIKEN announed Post-K early aess program to begin around Q2/CY Post-K prototype rak
28 30
SSD Based First Layer File System for the Next Generation Super-computer
SSD Based First Layer File System for the Next Generation Super-omputer Shinji Sumimoto, Ph.D. Next Generation Tehnial Computing Unit FUJITSU LIMITED Sept. 24 th, 2018 0 Outline of This Talk A64FX: High
More informationFujitsu High Performance CPU for the Post-K Computer
Fujitsu High Performance CPU for the Post-K Computer August 21 st, 2018 Toshio Yoshida FUJITSU LIMITED 0 Key Message A64FX is the new Fujitsu-designed Arm processor It is used in the post-k computer A64FX
More information次世代スーパーコンピュータ向け ファイルシステムについて
Gfarm シンポジウム 2018 次世代スーパーコンピュータ向け ファイルシステムについて Shinji Sumimoto, Ph.D. Next Generation Tehnial Computing Unit FUJITSU LIMITED Ot. 26 th, 2018 0 Outline of This Talk A64FX: High Performane Arm CPU Next Generation
More informationPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach
More informationPost-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED
Post-K Supercomputer Overview 1 Post-K supercomputer overview Developing Post-K as the successor to the K computer with RIKEN Developing HPC-optimized high performance CPU and system software Selected
More informationFUJITSU HPC and the Development of the Post-K Supercomputer
FUJITSU HPC and the Development of the Post-K Supercomputer Toshiyuki Shimizu Vice President, System Development Division, Next Generation Technical Computing Unit 0 November 16 th, 2016 Post-K is currently
More informationAnnouncements. Lecture Caching Issues for Multi-core Processors. Shared Vs. Private Caches for Small-scale Multi-core
Announements Your fous should be on the lass projet now Leture 17: Cahing Issues for Multi-ore Proessors This week: status update and meeting A short presentation on: projet desription (problem, importane,
More informationThe Tofu Interconnect D
2018 IEEE International Conferene on Cluster Computing The Tofu Interonnet D Yuihiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouihi Hirai, Toshiyuki Shimizu Next Generation Tehnial
More informationFujitsu s new supercomputer, delivering the next step in Exascale capability
Fujitsu s new supercomputer, delivering the next step in Exascale capability Toshiyuki Shimizu November 19th, 2014 0 Past, PRIMEHPC FX100, and roadmap for Exascale 2011 2012 2013 2014 2015 2016 2017 2018
More informationOverview of the Post-K processor
重点課題 9 シンポジウム 2019 年 1 9 Overview of the Post-K processor ポスト京システムの概要と開発進捗状況 Mitsuhisa Sato Team Leader of Architecture Development Team Deputy project leader, FLAGSHIP 2020 project Deputy Director, RIKEN
More informationFujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited
Fujitsu HPC Roadmap Beyond Petascale Computing Toshiyuki Shimizu Fujitsu Limited Outline Mission and HPC product portfolio K computer*, Fujitsu PRIMEHPC, and the future K computer and PRIMEHPC FX10 Post-FX10,
More informationIntroduction of Fujitsu s next-generation supercomputer
Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of
More informationSPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server
SPARC64 X: Fujitsu s New Generation 16 core Processor for UNIX Server 19 th April 2013 Toshio Yoshida Processor Development Division Enterprise Server Business Unit Fujitsu Limited SPARC64 TM SPARC64 TM
More informationPost-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED
Post-K Development and Introducing DLU 0 Fujitsu s HPC Development Timeline K computer The K computer is still competitive in various fields; from advanced research to manufacturing. Deep Learning Unit
More informationSPARC64 X: Fujitsu s New Generation 16 Core Processor for the next generation UNIX servers
X: Fujitsu s New Generation 16 Processor for the next generation UNIX servers August 29, 2012 Takumi Maruyama Processor Development Division Enterprise Server Business Unit Fujitsu Limited All Rights Reserved,Copyright
More informationToward Building up ARM HPC Ecosystem
Toward Building up ARM HPC Ecosystem Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Sept. 12 th, 2017 0 Outline Fujitsu s Super computer development history and Post-K
More informationTechnical Computing Suite supporting the hybrid system
Technical Computing Suite supporting the hybrid system Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect
More informationThe Tofu Interconnect D
The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji
More informationWhite paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation
White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview
More informationCOSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems
COSSIM An Integrated Solution to Address the Simulator Gap for Parallel Heterogeneous Systems Andreas Brokalakis Synelixis Solutions Ltd, Greee brokalakis@synelixis.om Nikolaos Tampouratzis Teleommuniation
More informationFujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED
Fujitsu Petascale Supercomputer PRIMEHPC FX10 4x2 racks (768 compute nodes) configuration PRIMEHPC FX10 Highlights Scales up to 23.2 PFLOPS Improves Fujitsu s supercomputer technology employed in the FX1
More informationFindings from real petascale computer systems with meteorological applications
15 th ECMWF Workshop Findings from real petascale computer systems with meteorological applications Toshiyuki Shimizu Next Generation Technical Computing Unit FUJITSU LIMITED October 2nd, 2012 Outline
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationPipelined Multipliers for Reconfigurable Hardware
Pipelined Multipliers for Reonfigurable Hardware Mithell J. Myjak and José G. Delgado-Frias Shool of Eletrial Engineering and Computer Siene, Washington State University Pullman, WA 99164-2752 USA {mmyjak,
More informationArm Processor Technology Update and Roadmap
Arm Processor Technology Update and Roadmap ARM Processor Technology Update and Roadmap Cavium: Giri Chukkapalli is a Distinguished Engineer in the Data Center Group (DCG) Introduction to ARM Architecture
More informationWhite paper Advanced Technologies of the Supercomputer PRIMEHPC FX10
White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10 Next Generation Technical Computing Unit Fujitsu Limited Contents Overview of the PRIMEHPC FX10 Supercomputer 2 SPARC64 TM IXfx: Fujitsu-Developed
More informationCOST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY
COST PERFORMANCE ASPECTS OF CCD FAST AUXILIARY MEMORY Dileep P, Bhondarkor Texas Instruments Inorporated Dallas, Texas ABSTRACT Charge oupled devies (CCD's) hove been mentioned as potential fast auxiliary
More informationXpander Rack Mount 2 Gen 3 HPC Version User Guide
Xpander Rak Mount 2 Gen 3 HPC Version User Guide Xpander Rak Mount 2 is a 2U rak mount PCI Express (PCIe) expansion enlosure that enables onnetion of two passively-ooled aelerators to a host omputer. The
More informationOn - Line Path Delay Fault Testing of Omega MINs M. Bellos 1, E. Kalligeros 1, D. Nikolos 1,2 & H. T. Vergos 1,2
On - Line Path Delay Fault Testing of Omega MINs M. Bellos, E. Kalligeros, D. Nikolos,2 & H. T. Vergos,2 Dept. of Computer Engineering and Informatis 2 Computer Tehnology Institute University of Patras,
More informationDECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Euncheol Kim, Gwan Choi, Mark Yeary *
DECODING OF ARRAY LDPC CODES USING ON-THE FLY COMPUTATION Kiran Gunnam, Weihuang Wang, Eunheol Kim, Gwan Choi, Mark Yeary * Dept. of Eletrial Engineering, Texas A&M University, College Station, TX-77840
More informationThe Tofu Interconnect 2
The Tofu Interconnect 2 Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shun Ando, Masahiro Maeda, Takahide Yoshikawa, Koji Hosoe, and Toshiyuki Shimizu Fujitsu Limited Introduction Tofu interconnect
More informationOutline: Software Design
Outline: Software Design. Goals History of software design ideas Design priniples Design methods Life belt or leg iron? (Budgen) Copyright Nany Leveson, Sept. 1999 A Little History... At first, struggling
More informationAdvanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED
Advanced Software for the Supercomputer PRIMEHPC FX10 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied
More informationProgramming for Fujitsu Supercomputers
Programming for Fujitsu Supercomputers Koh Hotta The Next Generation Technical Computing Fujitsu Limited To Programmers who are busy on their own research, Fujitsu provides environments for Parallel Programming
More informationZippy - A coarse-grained reconfigurable array with support for hardware virtualization
Zippy - A oarse-grained reonfigurable array with support for hardware virtualization Christian Plessl Computer Engineering and Networks Lab ETH Zürih, Switzerland plessl@tik.ee.ethz.h Maro Platzner Department
More informationCross-layer Resource Allocation on Broadband Power Line Based on Novel QoS-priority Scheduling Function in MAC Layer
Communiations and Networ, 2013, 5, 69-73 http://dx.doi.org/10.4236/n.2013.53b2014 Published Online September 2013 (http://www.sirp.org/journal/n) Cross-layer Resoure Alloation on Broadband Power Line Based
More informationToward Building up Arm HPC Ecosystem --Fujitsu s Activities--
Toward Building up Arm HPC Ecosystem --Fujitsu s Activities-- Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Jun. 28 th, 2018 0 Copyright 2018 FUJITSU LIMITED Outline of
More informationSPARC64 VII Fujitsu s Next Generation Quad-Core Processor
SPARC64 VII Fujitsu s Next Generation Quad-Core Processor August 26, 2008 Takumi Maruyama LSI Development Division Next Generation Technical Computing Unit Fujitsu Limited High Performance Technology High
More informationIBM CORAL HPC System Solution
IBM CORAL HPC System Solution HPC and HPDA towards Cognitive, AI and Deep Learning Deep Learning AI / Deep Learning Strategy for Power Power AI Platform High Performance Data Analytics Big Data Strategy
More informationMake your process world
Automation platforms Modion Quantum Safety System Make your proess world a safer plae You are faing omplex hallenges... Safety is at the heart of your proess In order to maintain and inrease your ompetitiveness,
More informationGetting the best performance from massively parallel computer
Getting the best performance from massively parallel computer June 6 th, 2013 Takashi Aoki Next Generation Technical Computing Unit Fujitsu Limited Agenda Second generation petascale supercomputer PRIMEHPC
More informationXpander Rack Mount 8 5U Gen 3 with Redundant Power [Part # XPRMG3-81A5URP] User Guide
Xpander Rak Mount 8 5U Gen 3 with Redundant Power [Part # XPRMG3-81A5URP] User Guide Xpander Rak Mount 8 5U Gen 3 with Redundant Power (RP) supplies is a rak mount PCI Express (PCIe) expansion enlosure
More informationPreliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths
Preliminary Performance Evaluation of Application Kernels using ARM SVE with Multiple Vector Lengths Y. Kodama, T. Odajima, M. Matsuda, M. Tsuji, J. Lee and M. Sato RIKEN AICS (Advanced Institute for Computational
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationFujitsu s Technologies to the K Computer
Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationArchitecture and Performance of the Hitachi SR2201 Massively Parallel Processor System
Arhiteture and Performane of the Hitahi SR221 Massively Parallel Proessor System Hiroaki Fujii, Yoshiko Yasuda, Hideya Akashi, Yasuhiro Inagami, Makoto Koga*, Osamu Ishihara*, Masamori Kashiyama*, Hideo
More informationReevaluating the overhead of data preparation for asymmetric multicore system on graphics processing
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS VOL. 10, NO. 7, Jul. 2016 3231 Copyright 2016 KSII Reevaluating the overhead of data preparation for asymmetri multiore system on graphis proessing
More informationComputing Pool: a Simplified and Practical Computational Grid Model
Computing Pool: a Simplified and Pratial Computational Grid Model Peng Liu, Yao Shi, San-li Li Institute of High Performane Computing, Department of Computer Siene and Tehnology, Tsinghua University, Beijing,
More informationAccelerating Multiprocessor Simulation with a Memory Timestamp Record
Aelerating Multiproessor Simulation with a Memory Timestamp Reord Kenneth Barr Heidi Pan Mihael Zhang Krste Asanovi Marh, 5 Massahusetts Institute of Tehnology Intelligent sampling gives est speed-auray
More informationHOKUSAI System. Figure 0-1 System diagram
HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application
More informationSVC-DASH-M: Scalable Video Coding Dynamic Adaptive Streaming Over HTTP Using Multiple Connections
SVC-DASH-M: Salable Video Coding Dynami Adaptive Streaming Over HTTP Using Multiple Connetions Samar Ibrahim, Ahmed H. Zahran and Mahmoud H. Ismail Department of Eletronis and Eletrial Communiations, Faulty
More informationPartial Character Decoding for Improved Regular Expression Matching in FPGAs
Partial Charater Deoding for Improved Regular Expression Mathing in FPGAs Peter Sutton Shool of Information Tehnology and Eletrial Engineering The University of Queensland Brisbane, Queensland, 4072, Australia
More informationAn Overview of Fujitsu s Lustre Based File System
An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu
More informationLearning Convention Propagation in BeerAdvocate Reviews from a etwork Perspective. Abstract
CS 9 Projet Final Report: Learning Convention Propagation in BeerAdvoate Reviews from a etwork Perspetive Abstrat We look at the way onventions propagate between reviews on the BeerAdvoate dataset, and
More informationMulti-Channel Wireless Networks: Capacity and Protocols
Multi-Channel Wireless Networks: Capaity and Protools Tehnial Report April 2005 Pradeep Kyasanur Dept. of Computer Siene, and Coordinated Siene Laboratory, University of Illinois at Urbana-Champaign Email:
More informationPRIMEHPC FX10: Advanced Software
PRIMEHPC FX10: Advanced Software Koh Hotta Fujitsu Limited System Software supports --- Stable/Robust & Low Overhead Execution of Large Scale Programs Operating System File System Program Development for
More informationCoprocessors, multi-scale modeling, fluid models and global warming. Chris Hill, MIT
Coproessors, multi-sale modeling, luid models and global warming. Chris Hill, MIT Outline Some motivation or high-resolution modeling o Earth oean system. the modeling hallenge An approah Sotware triks
More informationXpander Rack Mount 8 6U Gen 3 with Redundant Power [Part # XPRMG3-826URP] User Guide
Xpander Rak Mount 8 6U Gen 3 with Redundant Power [Part # XPRMG3-826URP] User Guide Xpander Rak Mount 8 6U Gen 3 with Redundant Power (RP) supplies is a rak mount PCI Express (PCIe) expansion enlosure
More informationMethods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems
Methods for Multi-Dimensional Robustness Optimization in Complex Embedded Systems Arne Hamann, Razvan Rau, Rolf Ernst Institute of Computer and Communiation Network Engineering Tehnial University of Braunshweig,
More informationUpdate of Post-K Development Yutaka Ishikawa RIKEN AICS
Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing
More informationKey Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED
Key Technologies for 100 PFLOPS How to keep the HPC-tree growing Molecular dynamics Computational materials Drug discovery Life-science Quantum chemistry Eigenvalue problem FFT Subatomic particle phys.
More informationThe AMDREL Project in Retrospective
The AMDREL Projet in Retrospetive K. Siozios 1, G. Koutroumpezis 1, K. Tatas 1, N. Vassiliadis 2, V. Kalenteridis 2, H. Pournara 2, I. Pappas 2, D. Soudris 1, S. Nikolaidis 2, S. Siskos 2, and A. Thanailakis
More informationPedraforca: a First ARM + GPU Cluster for HPC
www.bsc.es Pedraforca: a First ARM + GPU Cluster for HPC Nikola Puzovic, Alex Ramirez We ve hit the power wall ALL computers are limited by power consumption Energy-efficient approaches Multi-core Fujitsu
More informationCurrent Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN
Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages
More informationEARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA
EARLY EVALUATION OF THE CRAY XC40 SYSTEM THETA SUDHEER CHUNDURI, SCOTT PARKER, KEVIN HARMS, VITALI MOROZOV, CHRIS KNIGHT, KALYAN KUMARAN Performance Engineering Group Argonne Leadership Computing Facility
More informationARMv8-A Scalable Vector Extension for Post-K. Copyright 2016 FUJITSU LIMITED
ARMv8-A Scalable Vector Extension for Post-K 0 Post-K Supports New SIMD Extension The SIMD extension is a 512-bit wide implementation of SVE SVE is an HPC-focused SIMD instruction extension in AArch64
More informationCisco Collaborative Knowledge
Ciso Collaborative Knowledge Ciso on Ciso: How Ciso Servies Upskilled 14,400 Employees and Transformed into a Consultative, Solutions-Selling Organization. It is estimated that by 2020, roughly four out
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationRS485 Transceiver Component
RS485 Transeiver Component Publiation Date: 2013/3/25 XMOS 2013, All Rights Reserved. RS485 Transeiver Component 2/12 Table of Contents 1 Overview 3 2 Resoure Requirements 4 3 Hardware Platforms 5 3.1
More informationCSCI 402: Computer Architectures. Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI.
CSCI 402: Computer Architectures Parallel Processors (2) Fengguang Song Department of Computer & Information Science IUPUI 6.6 - End Today s Contents GPU Cluster and its network topology The Roofline performance
More informationMaking Light Work of the Future IP Network
Making Light Work of the Future IP Network HPSR 2002, Kobe Japan, May 28, 2002 Ken-ihi Sato NTT Network Innovation Laboratories Transmission apaity inrease has been signifiant sine the introdution of optial
More informationMapping MPI+X Applications to Multi-GPU Architectures
Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under
More informationCS8803SC Software and Hardware Cooperative Computing GPGPU. Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology
CS8803SC Software and Hardware Cooperative Computing GPGPU Prof. Hyesoon Kim School of Computer Science Georgia Institute of Technology Why GPU? A quiet revolution and potential build-up Calculation: 367
More informationFujitsu's Lustre Contributions - Policy and Roadmap-
Lustre Administrators and Developers Workshop 2014 Fujitsu's Lustre Contributions - Policy and Roadmap- Shinji Sumimoto, Kenichiro Sakai Fujitsu Limited, a member of OpenSFS Outline of This Talk Current
More informationDesign of a Parallel Vector Access Unit for SDRAM Memory Systems
Design of a Parallel Vetor Aess Unit for SDRAM Memory Systems Binu K. Mathew, Sally A. MKee, John B. Carter, Al Davis Department of Computer Siene University of Utah Salt Lake City, UT 84112 mbinu sam
More informationSystem-Level Parallelism and Throughput Optimization in Designing Reconfigurable Computing Applications
System-Level Parallelism and hroughput Optimization in Designing Reonfigurable Computing Appliations Esam El-Araby 1, Mohamed aher 1, Kris Gaj 2, arek El-Ghazawi 1, David Caliga 3, and Nikitas Alexandridis
More informationCA Test Data Manager 4.x Implementation Proven Professional Exam (CAT-681) Study Guide Version 1.0
Implementation Proven Professional Study Guide Version 1.0 PROPRIETARY AND CONFIDENTIAL INFORMATION 2017 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer
More informationThe Mathematics of Simple Ultrasonic 2-Dimensional Sensing
The Mathematis of Simple Ultrasoni -Dimensional Sensing President, Bitstream Tehnology The Mathematis of Simple Ultrasoni -Dimensional Sensing Introdution Our ompany, Bitstream Tehnology, has been developing
More information8 Instruction Selection
8 Instrution Seletion The IR ode instrutions were designed to do exatly one operation: load/store, add, subtrat, jump, et. The mahine instrutions of a real CPU often perform several of these primitive
More informationCA Privileged Access Manager 3.x Proven Implementation Professional Exam (CAT-661) Study Guide Version 1.0
Exam (CAT-661) Study Guide Version 1.0 PROPRIETARY AND CONFIDENTIAL INFMATION 2018 CA. All rights reserved. CA onfidential & proprietary information. For CA, CA Partner and CA Customer use only. No unauthorized
More informationXpander Rack Mount 8 5U Gen 3 User Guide
Xpander Rak Mount 8 5U Gen 3 User Guide Xpander Rak Mount 8 5U is a rak mount PCI Express (PCIe) expansion enlosure that enables onnetion of 8 double-wide graphis or other ontrollers with top-onneted auxiliary
More informationWhat are Cycle-Stealing Systems Good For? A Detailed Performance Model Case Study
What are Cyle-Stealing Systems Good For? A Detailed Performane Model Case Study Wayne Kelly and Jiro Sumitomo Queensland University of Tehnology, Australia {w.kelly, j.sumitomo}@qut.edu.au Abstrat The
More informationPreparing GPU-Accelerated Applications for the Summit Supercomputer
Preparing GPU-Accelerated Applications for the Summit Supercomputer Fernanda Foertter HPC User Assistance Group Training Lead foertterfs@ornl.gov This research used resources of the Oak Ridge Leadership
More informationMulti-Piece Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality
INTERNATIONAL CONFERENCE ON MANUFACTURING AUTOMATION (ICMA200) Multi-Piee Mold Design Based on Linear Mixed-Integer Program Toward Guaranteed Optimality Stephen Stoyan, Yong Chen* Epstein Department of
More informationExperiences of the Development of the Supercomputers
Experiences of the Development of the Supercomputers - Earth Simulator and K Computer YOKOKAWA, Mitsuo Kobe University/RIKEN AICS Application Oriented Systems Developed in Japan No.1 systems in TOP500
More informationTofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect
Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shunji Uno, Shinji Sumimoto, Kenichi Miura, Naoyuki Shida, Takahiro Kawashima,
More informationStarter Kit B80 User Guide. Version: 01 DocId: Starter_Kit_B80_v01. User Guide
Starter Kit B80 User Guide Version: 01 DoId: Starter_Kit_B80_v01 User Guide User Guide: Starter Kit B80 User Guide Version: 01 Date: 2012-08-14 DoId: Status Starter_Kit_B80_v01 GENERAL NOTE THE USE OF
More informationWhat does Heterogeneity bring?
What does Heterogeneity bring? Ken Koch Scientific Advisor, CCS-DO, LANL LACSI 2006 Conference October 18, 2006 Some Terminology Homogeneous Of the same or similar nature or kind Uniform in structure or
More informationDesign of High Speed Mac Unit
Design of High Speed Ma Unit 1 Harish Babu N, 2 Rajeev Pankaj N 1 PG Student, 2 Assistant professor Shools of Eletronis Engineering, VIT University, Vellore -632014, TamilNadu, India. 1 harishharsha72@gmail.om,
More informationVOLTA: PROGRAMMABILITY AND PERFORMANCE. Jack Choquette NVIDIA Hot Chips 2017
VOLTA: PROGRAMMABILITY AND PERFORMANCE Jack Choquette NVIDIA Hot Chips 2017 1 TESLA V100 21B transistors 815 mm 2 80 SM 5120 CUDA Cores 640 Tensor Cores 16 GB HBM2 900 GB/s HBM2 300 GB/s NVLink *full GV100
More informationFujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future
Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future November 16 th, 2011 Motoi Okuda Technical Computing Solution Unit Fujitsu Limited Agenda Achievements
More informationChallenges in Developing Highly Reliable HPC systems
Dec. 1, 2012 JS International Symopsium on DVLSI Systems 2012 hallenges in Developing Highly Reliable HP systems Koichiro akayama Fujitsu Limited K computer Developed jointly by RIKEN and Fujitsu First
More informationBrand-New Vector Supercomputer
Brand-New Vector Supercomputer NEC Corporation IT Platform Division Shintaro MOMOSE SC13 1 New Product NEC Released A Brand-New Vector Supercomputer, SX-ACE Just Now. Vector Supercomputer for Memory Bandwidth
More informationin Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity
Fujitsu High Performance Computing Ecosystem Human Centric Innovation in Action Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Innovation Flexibility Simplicity INTERNAL USE ONLY 0 Copyright
More informationGPUS FOR NGVLA. M Clark, April 2015
S FOR NGVLA M Clark, April 2015 GAMING DESIGN ENTERPRISE VIRTUALIZATION HPC & CLOUD SERVICE PROVIDERS AUTONOMOUS MACHINES PC DATA CENTER MOBILE The World Leader in Visual Computing 2 What is a? Tesla K40
More informationCurrent and Future Challenges of the Tofu Interconnect for Emerging Applications
Current and Future Challenges of the Tofu Interconnect for Emerging Applications Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 22, 2017, ExaComm 2017 Workshop
More informationPerformance Improvement of TCP on Wireless Cellular Networks by Adaptive FEC Combined with Explicit Loss Notification
erformane Improvement of TC on Wireless Cellular Networks by Adaptive Combined with Expliit Loss tifiation Masahiro Miyoshi, Masashi Sugano, Masayuki Murata Department of Infomatis and Mathematial Siene,
More informationCA Unified Infrastructure Management 8.x Implementation Proven Professional Exam (CAT-540) Study Guide Version 1.1
Management 8.x Implementation Proven Professional Exam (CAT-540) Study Guide Version 1.1 PROPRIETARY AND CONFIDENTIAL INFORMATION 2017 CA. All rights reserved. CA onfidential & proprietary information.
More informationDesign and Evaluation of a 2048 Core Cluster System
Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December
More information