Technical Computing Suite supporting the hybrid system
|
|
- Merry Ann Carter
- 5 years ago
- Views:
Transcription
1 Technical Computing Suite supporting the hybrid system Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster
2 Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect (Tofu) Fat-Tree Interconnect (Infiniband) Local file system (Temporary area occupied by jobs) IO network (IB), management network (GbE) Management nodes User Login nodes Login Compilation Job submission Global file system (Data storage area) Job management nodes File management nodes System operations management Job operations management Control nodes Administrator 1
3 System Software Stack System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment User/ISV Applications HPC Portal / System Management Portal Technical Computing Suite High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. Linux-based enhanced Operating System Supercomputer PRIMEHPC FX10 2 Red Hat Enterprise Linux PRIMERGY x86 cluster
4 System Operations Management Single system image in FX10 and PRIMERGY Installation / Update Packages High Availability Control Hardware/Software Monitoring Power Control Hierarchical structure for large-scale systems Load balance by using the job management sub node. Logical resource partition for efficient operations Easy to operate with single system image. Admin. PRIMEHPC FX10 Job manage sub node IO node nodes IO node nodes Resource Unit #1 3 Cluster Control node nodes Logical Resource Partition PRIMERGY Job manage sub node nodes Resource Unit #2
5 Installation / Update Packages Large-scale system support Hierarchical installer node structure Installation Update packages Installer node Sub installer node broad/multicast node Installer node Installer node repository Installer node Sub installer node broad/multicast node 2-tier package management Common packages: on all nodes. Additional packages: on some nodes. node1 node3 node node2 node4 2-tier (common/addition) package management node * Support diskless node for FX10 Common package PKG-A PKG-B PKG-C Additional pacakge-1 PKG-D PKG-E Additional pacakge-2 PKG-F 4
6 High Availability Features The important nodes have redundancy. Control nodes (Installer nodes) Job management nodes Job management sub nodes File servers (Meta Data Server / Object Storage Server) Full automatic failover Job management node/sub node is in hot standby mode. Continuing job execution even on the failure of the job management node Rapid failover without time lag 5 user Job management nodes active data active failure sync JOBs
7 System Software Stack System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment User/ISV Applications HPC Portal / System Management Portal Technical Computing Suite High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. Linux-based enhanced Operating System Super r: PRIMEHPC FX10 6 Red Hat Enterprise Linux PC cluster: PRIMERGY
8 Job Operations Management Same job operations in FX10 and PRIMERGY Efficient, fair and system-optimal resource usage Backfill scheduling Fair share scheduling System-optimal job scheduling Resource / Access control Elapsed time / CPU time / Physical memory Permission of job operation commands Reduce OS Jitter / Power saving control Backfilling disabled Backfilling enabled Time Now t1 t2 t3 Running job Job C Running job Job C T0 T1 Job A Job A Job B Job B Job C Job C 7
9 Optimal Job Scheduling for FX10 Interconnect topology-aware resource assignment One interconnect unit : 12 nodes (2 x 3 x 2) Job assignment rule: rectangular solid shape Guaranteeing neighbor communication Avoiding interfering with other jobs Rotates nodes to reduce fragmentation In-use unoccupied 6 z y x
10 Optimal Job Scheduling for FX10 Asynchronous file staging nodes PRIMEHPC FX10 Job A Interconnect IO nodes Stage IN/OUT Local file system Stage IN Asynchronously transfer files from Global to Local FS before the job starts. Stage OUT Asynchronously transfer files from Local to Global FS after the job ends. nodes Time Now t1 t2 t3 Running job Async. Job A Async. Job B Job C IO network (IB), management network (GbE) IO nodes Stage IN Stage IN Stage OUT Stage IN Stage OUT Login nodes Global file system (Data storage area) Co-scheduling of computation and file transfer. 9
11 Optimal Job Scheduling for PRIMERGY Fine-grained node assignment selection method : balancing / concentration Rank placement policy : pack / unpack Priority control of allocated nodes Execution mode : node is occupied or not by a job. Strict core assignment #0 #1 #2 Job A Job C Job B Job D concentration #0 #1 #2 R0 R0 Job B R1 Job A R1 Rank pack Rank unpack Processes are bound to cores in the job territory No process can move to cores in other job territory. 1 Job A 3 P 5 Job B 7 core 10
12 Reduce OS Jitter PRIMEHPC FX10 Stripped-down system processing Minimizes OS Jitter by using RDMA of Tofu. a. / service health check b. System information monitor (remote sadc) c. Job information monitor (CPU time/used memory) PRIMERGY Isolates OS Jitter from jobs by using Hyper-Threading. Avoiding the conflict between job and OS Jitter. IO node RDMA RDMA ICC Mem ICC Mem CPU CPU node node node Core Core Core HT HT HT Job HT HT HT OS 11
13 System Software Stack System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment User/ISV Applications HPC Portal / System Management Portal Technical Computing Suite High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. Linux-based enhanced Operating System Super r: PRIMEHPC FX10 12 Red Hat Enterprise Linux PC cluster: PRIMERGY
14 High Scalability Achieved high-scalable IO performance with multiple OSSes. Add Server&Storage Throughput Adapted various IO model Parallel IO (MPI-IO) Single Stream IO Number of servers Master IO Shared File OSS OSS OSS OSS File File File File OSS OSS OSS OSS 13 File File File File OSS OSS OSS OSS OSS: Object storage server
15 IO Bandwidth Guarantee Fair Share QoS: Sharing IO bandwidth with all users. Without Fair Share QoS Login IO Bandwidth File Servers With Fair Share QoS User A Not Fair User A Fair User B User B Best Effort QoS: Utilize all IO bandwidth exhaustively. Occupied by one client Client(s) File Servers Shared by all clients Client(s) A Client(s) B 70% 30% 14
16 High Reliability and High Availability Avoiding single point of failure by redundant hardware and failover mechanism. Monitoring & Managing Software File Management Server MDS (Active) IB SW IB SW Network path Failover RAID MDS MDS (Standby) 15 OSS (Active) RAID Failover OSS OSS (Active) RAID Dual Server Disk path RAID
17 Performance: I/O Throughput of FEFS Achieved the world s top-level throughput on K computer using over 2,500 OSS. We were encountered with serious problems: Memory shortage & System noise issues. Write:965GB/s Read:1486GB/s Collaborative work with RIKEN Throughput [GB/s] IOR Write (direct, file per proc) Throughput [GB/s] IOR Read (direct, file per proc) Number of OSS Number of OSS 16
18 System Software Stack System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment User/ISV Applications HPC Portal / System Management Portal Technical Computing Suite High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. Linux-based enhanced Operating System Super r: PRIMEHPC FX10 17 Red Hat Enterprise Linux PC cluster: PRIMERGY
19 VISIMPACT thread & process Hybrid-Parallel Programming Auto-parallel + MPI Auto-thread parallel in a chip VISIMPACT: CPU Architecture for low overhead parallelism among cores Inter-core hardware barrier Shared L2 Cache Automatic parallelization Process parallel by MPI Tofu barrier facility for collective communication 18
20 Customized MPI Library for High Scalability Point-to-Point communication The transfer method selection according to the data length, process location and number of hops Collective communication Barrier, Allreduce, Bcast and Reduce use Tofu-barrier & Reduction facility Bcast, Allgather, Allgatherv, Allreduce and Alltoall use Tofu-optimized algorithm Quotation from K computer performance data 19
21 Application Tuning Cycle and Tools Collecting Job Info. Analysis & Tuning Overall tuning Tofu-PA Profiler PAPI Vampir-trace RMATT Profiler snapshot Execution MPI Tuning CPU Tuning Rank mapping Fujitsu Tools Profiler Open Source Tools Vampir-trace 20
22 Rank Mapping Optimization (RMATT) Network Construction Communication Pattern (Communication processing contents between Rank) Rank number : 4096 rank Network construction : 16x16x16 node (4096) x,y,z order mapping 22.3ms input RMATT output Apply MPI_Allgather Communication Processing Performance Optimized Rank Map Reduce number of hop and congestion Remapping used RMATT apply 4 times performance Up 5.5ms
23 22
Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED
Advanced Software for the Supercomputer PRIMEHPC FX10 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied
More informationProgramming for Fujitsu Supercomputers
Programming for Fujitsu Supercomputers Koh Hotta The Next Generation Technical Computing Fujitsu Limited To Programmers who are busy on their own research, Fujitsu provides environments for Parallel Programming
More informationPRIMEHPC FX10: Advanced Software
PRIMEHPC FX10: Advanced Software Koh Hotta Fujitsu Limited System Software supports --- Stable/Robust & Low Overhead Execution of Large Scale Programs Operating System File System Program Development for
More informationPost-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED
Post-K Supercomputer Overview 1 Post-K supercomputer overview Developing Post-K as the successor to the K computer with RIKEN Developing HPC-optimized high performance CPU and system software Selected
More informationAn Overview of Fujitsu s Lustre Based File System
An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu
More informationPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach
More informationIntroduction of Fujitsu s next-generation supercomputer
Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of
More informationFujitsu s new supercomputer, delivering the next step in Exascale capability
Fujitsu s new supercomputer, delivering the next step in Exascale capability Toshiyuki Shimizu November 19th, 2014 0 Past, PRIMEHPC FX100, and roadmap for Exascale 2011 2012 2013 2014 2015 2016 2017 2018
More informationFujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED
Fujitsu Petascale Supercomputer PRIMEHPC FX10 4x2 racks (768 compute nodes) configuration PRIMEHPC FX10 Highlights Scales up to 23.2 PFLOPS Improves Fujitsu s supercomputer technology employed in the FX1
More informationFindings from real petascale computer systems with meteorological applications
15 th ECMWF Workshop Findings from real petascale computer systems with meteorological applications Toshiyuki Shimizu Next Generation Technical Computing Unit FUJITSU LIMITED October 2nd, 2012 Outline
More informationWhite paper Advanced Technologies of the Supercomputer PRIMEHPC FX10
White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10 Next Generation Technical Computing Unit Fujitsu Limited Contents Overview of the PRIMEHPC FX10 Supercomputer 2 SPARC64 TM IXfx: Fujitsu-Developed
More informationWhite paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation
White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationTopology Awareness in the Tofu Interconnect Series
Topology Awareness in the Tofu Interconnect Series Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 23rd, 2016, ExaComm2016 Workshop 0 Introduction Networks
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationGetting the best performance from massively parallel computer
Getting the best performance from massively parallel computer June 6 th, 2013 Takashi Aoki Next Generation Technical Computing Unit Fujitsu Limited Agenda Second generation petascale supercomputer PRIMEHPC
More informationFujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited
Fujitsu HPC Roadmap Beyond Petascale Computing Toshiyuki Shimizu Fujitsu Limited Outline Mission and HPC product portfolio K computer*, Fujitsu PRIMEHPC, and the future K computer and PRIMEHPC FX10 Post-FX10,
More informationFujitsu's Lustre Contributions - Policy and Roadmap-
Lustre Administrators and Developers Workshop 2014 Fujitsu's Lustre Contributions - Policy and Roadmap- Shinji Sumimoto, Kenichiro Sakai Fujitsu Limited, a member of OpenSFS Outline of This Talk Current
More informationFUJITSU HPC and the Development of the Post-K Supercomputer
FUJITSU HPC and the Development of the Post-K Supercomputer Toshiyuki Shimizu Vice President, System Development Division, Next Generation Technical Computing Unit 0 November 16 th, 2016 Post-K is currently
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationHOKUSAI System. Figure 0-1 System diagram
HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationChelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch
PERFORMANCE BENCHMARKS Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch Chelsio Communications www.chelsio.com sales@chelsio.com +1-408-962-3600 Executive Summary Ethernet provides a reliable
More informationFUJITSU PHI Turnkey Solution
FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture
More informationThe Tofu Interconnect D
The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationCurrent Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN
Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages
More informationTofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect
Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shunji Uno, Shinji Sumimoto, Kenichi Miura, Naoyuki Shida, Takahiro Kawashima,
More informationPBS PROFESSIONAL VS. MICROSOFT HPC PACK
PBS PROFESSIONAL VS. MICROSOFT HPC PACK On the Microsoft Windows Platform PBS Professional offers many features which are not supported by Microsoft HPC Pack. SOME OF THE IMPORTANT ADVANTAGES OF PBS PROFESSIONAL
More informationKey Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED
Key Technologies for 100 PFLOPS How to keep the HPC-tree growing Molecular dynamics Computational materials Drug discovery Life-science Quantum chemistry Eigenvalue problem FFT Subatomic particle phys.
More informationMPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA
MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to
More informationIntel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage
Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationExperiences of the Development of the Supercomputers
Experiences of the Development of the Supercomputers - Earth Simulator and K Computer YOKOKAWA, Mitsuo Kobe University/RIKEN AICS Application Oriented Systems Developed in Japan No.1 systems in TOP500
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationChallenges in Developing Highly Reliable HPC systems
Dec. 1, 2012 JS International Symopsium on DVLSI Systems 2012 hallenges in Developing Highly Reliable HP systems Koichiro akayama Fujitsu Limited K computer Developed jointly by RIKEN and Fujitsu First
More informationUpdate of Post-K Development Yutaka Ishikawa RIKEN AICS
Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationNAMD Performance Benchmark and Profiling. January 2015
NAMD Performance Benchmark and Profiling January 2015 2 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute resource
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationCP2K Performance Benchmark and Profiling. April 2011
CP2K Performance Benchmark and Profiling April 2011 Note The following research was performed under the HPC Advisory Council HPC works working group activities Participating vendors: HP, Intel, Mellanox
More informationInfiniBand-based HPC Clusters
Boosting Scalability of InfiniBand-based HPC Clusters Asaf Wachtel, Senior Product Manager 2010 Voltaire Inc. InfiniBand-based HPC Clusters Scalability Challenges Cluster TCO Scalability Hardware costs
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationBasic Specification of Oakforest-PACS
Basic Specification of Oakforest-PACS Joint Center for Advanced HPC (JCAHPC) by Information Technology Center, the University of Tokyo and Center for Computational Sciences, University of Tsukuba Oakforest-PACS
More informationOpenFOAM Performance Testing and Profiling. October 2017
OpenFOAM Performance Testing and Profiling October 2017 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Huawei, Mellanox Compute resource - HPC
More informationDesigning Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi
More informationPost-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED
Post-K Development and Introducing DLU 0 Fujitsu s HPC Development Timeline K computer The K computer is still competitive in various fields; from advanced research to manufacturing. Deep Learning Unit
More informationFujitsu s Contribution to the Lustre Community
Lustre Developer Summit 2014 Fujitsu s Contribution to the Lustre Community Sep.24 2014 Kenichiro Sakai, Shinji Sumimoto Fujitsu Limited, a member of OpenSFS Outline of This Talk Fujitsu s Development
More informationDesigning Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen
Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit
More informationFujitsu s Technologies to the K Computer
Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The
More informationComputer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research
Computer Science Section Computational and Information Systems Laboratory National Center for Atmospheric Research My work in the context of TDD/CSS/ReSET Polynya new research computing environment Polynya
More informationIntel Enterprise Processors Technology
Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology
More informationThe Role of InfiniBand Technologies in High Performance Computing. 1 Managed by UT-Battelle for the Department of Energy
The Role of InfiniBand Technologies in High Performance Computing 1 Managed by UT-Battelle Contributors Gil Bloch Noam Bloch Hillel Chapman Manjunath Gorentla- Venkata Richard Graham Michael Kagan Vasily
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationCurrent and Future Challenges of the Tofu Interconnect for Emerging Applications
Current and Future Challenges of the Tofu Interconnect for Emerging Applications Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 22, 2017, ExaComm 2017 Workshop
More informationToward An Integrated Cluster File System
Toward An Integrated Cluster File System Adrien Lebre February 1 st, 2008 XtreemOS IP project is funded by the European Commission under contract IST-FP6-033576 Outline Context Kerrighed and root file
More informationChelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING
Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationDesign and Evaluation of a 2048 Core Cluster System
Design and Evaluation of a 2048 Core Cluster System, Torsten Höfler, Torsten Mehlan and Wolfgang Rehm Computer Architecture Group Department of Computer Science Chemnitz University of Technology December
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationThe Spider Center-Wide File System
The Spider Center-Wide File System Presented by Feiyi Wang (Ph.D.) Technology Integration Group National Center of Computational Sciences Galen Shipman (Group Lead) Dave Dillow, Sarp Oral, James Simmons,
More informationCS500 SMARTER CLUSTER SUPERCOMPUTERS
CS500 SMARTER CLUSTER SUPERCOMPUTERS OVERVIEW Extending the boundaries of what you can achieve takes reliable computing tools matched to your workloads. That s why we tailor the Cray CS500 cluster supercomputer
More informationFujitsu High Performance CPU for the Post-K Computer
Fujitsu High Performance CPU for the Post-K Computer August 21 st, 2018 Toshio Yoshida FUJITSU LIMITED 0 Key Message A64FX is the new Fujitsu-designed Arm processor It is used in the post-k computer A64FX
More informationDesigning Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,
More informationDELL EMC ISILON F800 AND H600 I/O PERFORMANCE
DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access
More informationLUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November Abstract
LUSTRE NETWORKING High-Performance Features and Flexible Support for a Wide Array of Networks White Paper November 2008 Abstract This paper provides information about Lustre networking that can be used
More informationCMS experience with the deployment of Lustre
CMS experience with the deployment of Lustre Lavinia Darlea, on behalf of CMS DAQ Group MIT/DAQ CMS April 12, 2016 1 / 22 Role and requirements CMS DAQ2 System Storage Manager and Transfer System (SMTS)
More informationRed Hat HPC Solution Overview. Platform Computing
Red Hat HPC Solution Overview Gerry Riveros Red Hat Senior Product Marketing Manager griveros@redhat.com Robbie Jones Platform Computing Senior Systems Engineer rjones@platform.com 1 Overview 2 Trends
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture
More informationOpen storage architecture for private Oracle database clouds
Open storage architecture for private Oracle database clouds White Paper rev. 2016-05-18 2016 FlashGrid Inc. 1 www.flashgrid.io Abstract Enterprise IT is transitioning from proprietary mainframe and UNIX
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2016 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture How Programming
More informationCarlo Cavazzoni, HPC department, CINECA
Introduction to Shared memory architectures Carlo Cavazzoni, HPC department, CINECA Modern Parallel Architectures Two basic architectural scheme: Distributed Memory Shared Memory Now most computers have
More informationDesigned for Maximum Accelerator Performance
Designed for Maximum Accelerator Performance A dense, GPU-accelerated cluster supercomputer that delivers up to 329 double-precision GPU teraflops in one rack. This power- and spaceefficient system can
More informationScalable Cluster Computing with NVIDIA GPUs Axel Koehler NVIDIA. NVIDIA Corporation 2012
Scalable Cluster Computing with NVIDIA GPUs Axel Koehler NVIDIA Outline Introduction to Multi-GPU Programming Communication for Single Host, Multiple GPUs Communication for Multiple Hosts, Multiple GPUs
More informationToward Building up ARM HPC Ecosystem
Toward Building up ARM HPC Ecosystem Shinji Sumimoto, Ph.D. Next Generation Technical Computing Unit FUJITSU LIMITED Sept. 12 th, 2017 0 Outline Fujitsu s Super computer development history and Post-K
More informationA Study of High Performance Computing and the Cray SV1 Supercomputer. Michael Sullivan TJHSST Class of 2004
A Study of High Performance Computing and the Cray SV1 Supercomputer Michael Sullivan TJHSST Class of 2004 June 2004 0.1 Introduction A supercomputer is a device for turning compute-bound problems into
More informationThe MOSIX Scalable Cluster Computing for Linux. mosix.org
The MOSIX Scalable Cluster Computing for Linux Prof. Amnon Barak Computer Science Hebrew University http://www. mosix.org 1 Presentation overview Part I : Why computing clusters (slide 3-7) Part II : What
More informationAdapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES
Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES Tom Atwood Business Development Manager Sun Microsystems, Inc. Takeaways Understand the technical differences between
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationXyratex ClusterStor6000 & OneStor
Xyratex ClusterStor6000 & OneStor Proseminar Ein-/Ausgabe Stand der Wissenschaft von Tim Reimer Structure OneStor OneStorSP OneStorAP ''Green'' Advancements ClusterStor6000 About Scale-Out Storage Architecture
More informationNetworking for Data Acquisition Systems. Fabrice Le Goff - 14/02/ ISOTDAQ
Networking for Data Acquisition Systems Fabrice Le Goff - 14/02/2018 - ISOTDAQ Outline Generalities The OSI Model Ethernet and Local Area Networks IP and Routing TCP, UDP and Transport Efficiency Networking
More informationOak Ridge National Laboratory Computing and Computational Sciences
Oak Ridge National Laboratory Computing and Computational Sciences OFA Update by ORNL Presented by: Pavel Shamis (Pasha) OFA Workshop Mar 17, 2015 Acknowledgments Bernholdt David E. Hill Jason J. Leverman
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationFuture Trends in Hardware and Software for use in Simulation
Future Trends in Hardware and Software for use in Simulation Steve Feldman VP/IT, CD-adapco April, 2009 HighPerformanceComputing Building Blocks CPU I/O Interconnect Software General CPU Maximum clock
More informationFeedback on BeeGFS. A Parallel File System for High Performance Computing
Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December
More informationFujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future
Fujitsu s Technologies Leading to Practical Petascale Computing: K computer, PRIMEHPC FX10 and the Future November 16 th, 2011 Motoi Okuda Technical Computing Solution Unit Fujitsu Limited Agenda Achievements
More informationIntra-MIC MPI Communication using MVAPICH2: Early Experience
Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox, Altair Compute
More informationAgenda. System Performance Scaling of IBM POWER6 TM Based Servers
System Performance Scaling of IBM POWER6 TM Based Servers Jeff Stuecheli Hot Chips 19 August 2007 Agenda Historical background POWER6 TM chip components Interconnect topology Cache Coherence strategies
More informationNon-uniform memory access machine or (NUMA) is a system where the memory access time to any region of memory is not the same for all processors.
CS 320 Ch. 17 Parallel Processing Multiple Processor Organization The author makes the statement: "Processors execute programs by executing machine instructions in a sequence one at a time." He also says
More informationEfficient Object Storage Journaling in a Distributed Parallel File System
Efficient Object Storage Journaling in a Distributed Parallel File System Presented by Sarp Oral Sarp Oral, Feiyi Wang, David Dillow, Galen Shipman, Ross Miller, and Oleg Drokin FAST 10, Feb 25, 2010 A
More information