Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED
|
|
- Jared Barker
- 6 years ago
- Views:
Transcription
1 Advanced Software for the Supercomputer PRIMEHPC FX10
2 System Configuration of PRIMEHPC FX10 nodes Login Compilation Job submission 6D mesh/torus Interconnect Local file system (Temporary area occupied by jobs) Data transfer to/from global file system Data communication for system job operations management IO network (IB), management network (GbE) User Login nodes Global file system (Data storage area) Job management nodes Management nodes File management nodes Control nodes System operations management Job operations management System integration node Administrator 1
3 System Software Stack User/ISV Applications HPC Portal / System Management Portal System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. File system, operations management Application development environment Linux-based enhanced Operating System PRIMEHPC FX10 2 Enhanced hardware support System noise reduction Error detection / Low power
4 OS (Linux-based enhanced Operating System) Easy existing application porting POSIX API: Linux kernel 2.6.x, glibc 2.x High performance / High scalability Enhanced hardware support CPU registers, Large memory page, High speed interconnect Reduce system noise in highly parallel program Inter-node OS scheduling High availability / low power consumption Hardware error detection / isolation memory patrol, io driver enhance. CPU suspend during system idle state. Daemon services = System noise Application running Application running idle wait Synchronous daemon scheduling Idle CPU suspend Job running 3
5 System Software Stack User/ISV Applications HPC Portal / System Management Portal System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. File system, operations management Application development environment Linux-based enhanced Operating System PRIMEHPC FX10 4 Enhanced hardware support System noise reduction Error detection / Low power
6 System Operations Management Hierarchical structure for efficient system operation and adaptability to large-scale systems The load is distributed by using the job management sub node. Easy to operate with a single system image The system is efficiently operated by dividing a logical resource partition named Resource Unit. System administrator Easy to operate with single system image group#1 Job manage sub node IO node IO node Control node Job management node IO node - Power control - Hardware monitoring - Software service monitoring Job operations management group#2 Job manage sub node IO node Cluster Hierarchical structure for による efficient 効率的な運用 operation nodes nodes Resource Unit #1 5 nodes Tofu interconnect Logical 階層化によ Resource る効率的な Partition nodes Resource Unit#2
7 High Availability System The important nodes have redundancy Control node Job management node Job management sub node File servers For example : right figure Continuing job execution even if the job management node is in failed status The job data always synchronizes between active node and stand-by node. Alternatively to stand-by node if active node is down. 6 user Job management nodes active data stand-by active failure sync JOBs
8 Job Operations Environment Efficient resource usage Flexible job scheduling based on prioritized resource assignment Interconnect topology-aware resource assignment Backfill scheduling for keeping the resources busy Asynchronous file staging High availability Avoids assigning faulty resources to jobs disconnects faulty nodes from job operations Management nodes with failover support Backfilling disabled Backfilling enabled Time Now t1 t2 t3 Running job Job C Running job Job C T0 T1 Job A Job A Job B Job B Job C Job C 7
9 Resource Assignment Interconnect topology-aware resource assignment Treats 12 compute nodes as one interconnect unit Assigns cubic-shaped interconnect unit(s) to a job Using adjacent interconnect unit(s) is suitable for contiguous communication, and also avoids interfering with other jobs. Optimizes the alignment of resources Rotating the cubic-shaped interconnect units This improves total system utilization by rotating the cubic shaped interconnect units. 6 z y x In-use unoccupied
10 System Software Stack User/ISV Applications HPC Portal / System Management Portal System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions Support Tools IDE Profiler & Tuning tools Interactive debugger MPI Library Scalability of High-Func. Barrier Comm. File system, operations management Application development environment Linux-based enhanced Operating System PRIMEHPC FX10 9 Enhanced hardware support System noise reduction Error detection / Low power
11 Throughput High Scalability Achieved high-scalable IO performance with multiple OSSes. Add Server&Storage Adapted various IO model Parallel IO (MPI-IO) Single Stream IO Number of servers Master IO Shared File OSS OSS OSS OSS File File File File OSS OSS OSS OSS 10 File File File File OSS OSS OSS OSS OSS: Object storage server
12 IO Bandwidth Guarantee Fair Share QoS: Sharing IO bandwidth with all users. Without Fair Share QoS Login IO Bandwidth File Servers With Fair Share QoS User A Not Fair User A Fair User B User B Best Effort QoS: Utilize all IO bandwidth exhaustively. Occupied by one client Client(s) File Servers Shared by all clients Client(s) A Client(s) B 11
13 High Reliability and High Availability Avoiding single point of failure by redundant hardware and failover mechanism. Monitoring & Managing Software File Management Server MDS (Active) IB SW IB SW Network path Failover RAID MDS MDS (Standby) 12 OSS (Active) RAID Failover OSS OSS (Active) RAID Dual Server Disk path RAID
14 System Software Stack User/ISV Applications HPC Portal / System Management Portal System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment File system, operations management High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions MPI Library Scalability of High-Func. Barrier Comm. Support Tools IDE Profiler & Tuning tools Interactive debugger Application development environment Linux-based enhanced Operating System PRIMEHPC FX10 13 Enhanced hardware support System noise reduction Error detection / Low power
15 Time VISIMPACT TM (Virtual Single Processor by Integrated Multi-core Parallel Architecture) Mechanism that treats multiple cores as one high-speed CPU Easy and efficient execution of intercore thread parallel processing with a multi-core CPU Supports the realization of a highlyefficient Hybrid model (Automatic parallelization + MPI) CPU technologies Large-capacity shared L2 cache memory decrease in the influence of false sharing Inter-core hardware barrier facilities 6-10 times faster than conventional software barrier Memory 14 L2$ L2$ CPU Core Process Core Process Barrier synchronization Core Thread 1 Core Thread 2 Memory L2$ CPU Core Process Inter-core thread parallel processing Core Hardware barrier synchronization: 10 times faster than conventional system Core Thread N
16 System Software Stack User/ISV Applications HPC Portal / System Management Portal System operations management System configuration management System control System monitoring System installation & operation Job operations management Job manager Job scheduler Resource management Parallel execution environment File system, operations management High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability VISIMPACT TM Shared L2 cache on a chip Hardware intra-processor synchronization Compilers Hybrid parallel programming Sector cache support SIMD / Register file extensions MPI Library Scalability of High-Func. Barrier Comm. Support Tools IDE Profiler & Tuning tools Interactive debugger Application development environment Linux-based enhanced Operating System PRIMEHPC FX10 15 Enhanced hardware support System noise reduction Error detection / Low power
17 Programming Model for High Scalability Hybrid parallelism by VISIMPACT and MPI library VISIMPACT Automated multi-thread parallelization High performance thread barrier used Inter-core hardware barrier facility MPI library High performance collective communications used Tofu barrier facility Performance ratio Time Ratio Scalability of Himeno benchmark(xl size) Hybrid + Tofu barrier Hybrid Flat MPI Quotation from K computer performance data 1,000 10, ,000 Number of cores 1 0 Himeno benchmark detail (65536 Core) Collective communication Neighbor Communication and Calculation Quotation from K computer performance data Hybrid + Tofu barrier Hybrid Flat MPI
18 Elapsed Time (micro seconds) Customized MPI Library for High Scalability Point-to-Point communication Use a special type of low-latency path that bypasses the software layer The transfer method optimization according to the data length, process location and number of hops Collective communication High performance Barrier, Allreduce, Bcast and Reduce used Tofu barrier facility Scalable Bcast, Allgather, Allgatherv, Allreduce and Alltoall algorithm optimized for Tofu network Barrier / Allreduce Performance Barrier (Tofu Barrier) Barrier (software) Allreduce (Tofu Barrier) Allreduce (software) Quotation from K computer performance data 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Number of s (48x6xZ)
19 Compiler Optimization for High Performance Instruction-level parallelism with SIMD instructions Improvement of computing efficiency used Expanded registers Improvement of cache efficiency used Sector cache NPB3.3 LU Execution time comparison (relative values) Faster Memory wait Operation wait FX1 Cache misses Instructions committed Efficient use of Expanded registers reduces Operation wait PRIMEHPC FX10 18 NPB3.3 MG Execution time comparison (relative values) Memory wait Operation wait Faster FX1 Cache misses Instructions committed SIMD implementation reduces Instructions committed PRIMEHPC FX10
20 Application Tuning Cycle and Tools Job Information Profiler Vampir-trace RMATT Tofu-PA Execution MPI Tuning CPU Tuning Overall Tuning Profiler snapshot FX10 Specific Tools Profiler Vampir-trace Open Source Tools PAPI 19
21 20
PRIMEHPC FX10: Advanced Software
PRIMEHPC FX10: Advanced Software Koh Hotta Fujitsu Limited System Software supports --- Stable/Robust & Low Overhead Execution of Large Scale Programs Operating System File System Program Development for
More informationTechnical Computing Suite supporting the hybrid system
Technical Computing Suite supporting the hybrid system Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster Hybrid System Configuration Supercomputer PRIMEHPC FX10 PRIMERGY x86 cluster 6D mesh/torus Interconnect
More informationProgramming for Fujitsu Supercomputers
Programming for Fujitsu Supercomputers Koh Hotta The Next Generation Technical Computing Fujitsu Limited To Programmers who are busy on their own research, Fujitsu provides environments for Parallel Programming
More informationFujitsu s Approach to Application Centric Petascale Computing
Fujitsu s Approach to Application Centric Petascale Computing 2 nd Nov. 2010 Motoi Okuda Fujitsu Ltd. Agenda Japanese Next-Generation Supercomputer, K Computer Project Overview Design Targets System Overview
More informationPost-K Supercomputer Overview. Copyright 2016 FUJITSU LIMITED
Post-K Supercomputer Overview 1 Post-K supercomputer overview Developing Post-K as the successor to the K computer with RIKEN Developing HPC-optimized high performance CPU and system software Selected
More informationFujitsu s new supercomputer, delivering the next step in Exascale capability
Fujitsu s new supercomputer, delivering the next step in Exascale capability Toshiyuki Shimizu November 19th, 2014 0 Past, PRIMEHPC FX100, and roadmap for Exascale 2011 2012 2013 2014 2015 2016 2017 2018
More informationAn Overview of Fujitsu s Lustre Based File System
An Overview of Fujitsu s Lustre Based File System Shinji Sumimoto Fujitsu Limited Apr.12 2011 For Maximizing CPU Utilization by Minimizing File IO Overhead Outline Target System Overview Goals of Fujitsu
More informationIntroduction of Fujitsu s next-generation supercomputer
Introduction of Fujitsu s next-generation supercomputer MATSUMOTO Takayuki July 16, 2014 HPC Platform Solutions Fujitsu has a long history of supercomputing over 30 years Technologies and experience of
More informationJapan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS
Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS HPC User Forum, 7 th September, 2016 Outline of Talk Introduction of FLAGSHIP2020 project An Overview of post K system Concluding Remarks
More informationGetting the best performance from massively parallel computer
Getting the best performance from massively parallel computer June 6 th, 2013 Takashi Aoki Next Generation Technical Computing Unit Fujitsu Limited Agenda Second generation petascale supercomputer PRIMEHPC
More informationWhite paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation
White paper FUJITSU Supercomputer PRIMEHPC FX100 Evolution to the Next Generation Next Generation Technical Computing Unit Fujitsu Limited Contents FUJITSU Supercomputer PRIMEHPC FX100 System Overview
More informationFindings from real petascale computer systems with meteorological applications
15 th ECMWF Workshop Findings from real petascale computer systems with meteorological applications Toshiyuki Shimizu Next Generation Technical Computing Unit FUJITSU LIMITED October 2nd, 2012 Outline
More informationFujitsu s Technologies to the K Computer
Fujitsu s Technologies to the K Computer - a journey to practical Petascale computing platform - June 21 nd, 2011 Motoi Okuda FUJITSU Ltd. Agenda The Next generation supercomputer project of Japan The
More informationWhite paper Advanced Technologies of the Supercomputer PRIMEHPC FX10
White paper Advanced Technologies of the Supercomputer PRIMEHPC FX10 Next Generation Technical Computing Unit Fujitsu Limited Contents Overview of the PRIMEHPC FX10 Supercomputer 2 SPARC64 TM IXfx: Fujitsu-Developed
More informationPost-K: Building the Arm HPC Ecosystem
Post-K: Building the Arm HPC Ecosystem Toshiyuki Shimizu FUJITSU LIMITED Nov. 14th, 2017 Exhibitor Forum, SC17, Nov. 14, 2017 0 Post-K: Building up Arm HPC Ecosystem Fujitsu s approach for HPC Approach
More informationTopology Awareness in the Tofu Interconnect Series
Topology Awareness in the Tofu Interconnect Series Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 23rd, 2016, ExaComm2016 Workshop 0 Introduction Networks
More informationChallenges in Developing Highly Reliable HPC systems
Dec. 1, 2012 JS International Symopsium on DVLSI Systems 2012 hallenges in Developing Highly Reliable HP systems Koichiro akayama Fujitsu Limited K computer Developed jointly by RIKEN and Fujitsu First
More informationBlue Gene/Q. Hardware Overview Michael Stephan. Mitglied der Helmholtz-Gemeinschaft
Blue Gene/Q Hardware Overview 02.02.2015 Michael Stephan Blue Gene/Q: Design goals System-on-Chip (SoC) design Processor comprises both processing cores and network Optimal performance / watt ratio Small
More informationThe way toward peta-flops
The way toward peta-flops ISC-2011 Dr. Pierre Lagier Chief Technology Officer Fujitsu Systems Europe Where things started from DESIGN CONCEPTS 2 New challenges and requirements! Optimal sustained flops
More informationFujitsu Petascale Supercomputer PRIMEHPC FX10. 4x2 racks (768 compute nodes) configuration. Copyright 2011 FUJITSU LIMITED
Fujitsu Petascale Supercomputer PRIMEHPC FX10 4x2 racks (768 compute nodes) configuration PRIMEHPC FX10 Highlights Scales up to 23.2 PFLOPS Improves Fujitsu s supercomputer technology employed in the FX1
More informationCurrent Status of the Next- Generation Supercomputer in Japan. YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN
Current Status of the Next- Generation Supercomputer in Japan YOKOKAWA, Mitsuo Next-Generation Supercomputer R&D Center RIKEN International Workshop on Peta-Scale Computing Programming Environment, Languages
More informationFujitsu HPC Roadmap Beyond Petascale Computing. Toshiyuki Shimizu Fujitsu Limited
Fujitsu HPC Roadmap Beyond Petascale Computing Toshiyuki Shimizu Fujitsu Limited Outline Mission and HPC product portfolio K computer*, Fujitsu PRIMEHPC, and the future K computer and PRIMEHPC FX10 Post-FX10,
More informationKey Technologies for 100 PFLOPS. Copyright 2014 FUJITSU LIMITED
Key Technologies for 100 PFLOPS How to keep the HPC-tree growing Molecular dynamics Computational materials Drug discovery Life-science Quantum chemistry Eigenvalue problem FFT Subatomic particle phys.
More informationAdvantages to Using MVAPICH2 on TACC HPC Clusters
Advantages to Using MVAPICH2 on TACC HPC Clusters Jérôme VIENNE viennej@tacc.utexas.edu Texas Advanced Computing Center (TACC) University of Texas at Austin Wednesday 27 th August, 2014 1 / 20 Stampede
More informationChelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch
PERFORMANCE BENCHMARKS Chelsio 10G Ethernet Open MPI OFED iwarp with Arista Switch Chelsio Communications www.chelsio.com sales@chelsio.com +1-408-962-3600 Executive Summary Ethernet provides a reliable
More informationFUJITSU HPC and the Development of the Post-K Supercomputer
FUJITSU HPC and the Development of the Post-K Supercomputer Toshiyuki Shimizu Vice President, System Development Division, Next Generation Technical Computing Unit 0 November 16 th, 2016 Post-K is currently
More informationSTAR-CCM+ Performance Benchmark and Profiling. July 2014
STAR-CCM+ Performance Benchmark and Profiling July 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: CD-adapco, Intel, Dell, Mellanox Compute
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationPost-K Development and Introducing DLU. Copyright 2017 FUJITSU LIMITED
Post-K Development and Introducing DLU 0 Fujitsu s HPC Development Timeline K computer The K computer is still competitive in various fields; from advanced research to manufacturing. Deep Learning Unit
More informationFujitsu's Lustre Contributions - Policy and Roadmap-
Lustre Administrators and Developers Workshop 2014 Fujitsu's Lustre Contributions - Policy and Roadmap- Shinji Sumimoto, Kenichiro Sakai Fujitsu Limited, a member of OpenSFS Outline of This Talk Current
More informationOverview of Tianhe-2
Overview of Tianhe-2 (MilkyWay-2) Supercomputer Yutong Lu School of Computer Science, National University of Defense Technology; State Key Laboratory of High Performance Computing, China ytlu@nudt.edu.cn
More informationDesigning Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi
More informationScalable Cluster Computing with NVIDIA GPUs Axel Koehler NVIDIA. NVIDIA Corporation 2012
Scalable Cluster Computing with NVIDIA GPUs Axel Koehler NVIDIA Outline Introduction to Multi-GPU Programming Communication for Single Host, Multiple GPUs Communication for Multiple Hosts, Multiple GPUs
More informationSun Lustre Storage System Simplifying and Accelerating Lustre Deployments
Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments Torben Kling-Petersen, PhD Presenter s Name Principle Field Title andengineer Division HPC &Cloud LoB SunComputing Microsystems
More informationThe Tofu Interconnect D
The Tofu Interconnect D 11 September 2018 Yuichiro Ajima, Takahiro Kawashima, Takayuki Okamoto, Naoyuki Shida, Kouichi Hirai, Toshiyuki Shimizu, Shinya Hiramoto, Yoshiro Ikeda, Takahide Yoshikawa, Kenji
More informationScaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc
Scaling to Petaflop Ola Torudbakken Distinguished Engineer Sun Microsystems, Inc HPC Market growth is strong CAGR increased from 9.2% (2006) to 15.5% (2007) Market in 2007 doubled from 2003 (Source: IDC
More informationIntel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage
Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage Evaluation of Lustre File System software enhancements for improved Metadata performance Wojciech Turek, Paul Calleja,John
More informationThe Role of InfiniBand Technologies in High Performance Computing. 1 Managed by UT-Battelle for the Department of Energy
The Role of InfiniBand Technologies in High Performance Computing 1 Managed by UT-Battelle Contributors Gil Bloch Noam Bloch Hillel Chapman Manjunath Gorentla- Venkata Richard Graham Michael Kagan Vasily
More informationZEST Snapshot Service. A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1
ZEST Snapshot Service A Highly Parallel Production File System by the PSC Advanced Systems Group Pittsburgh Supercomputing Center 1 Design Motivation To optimize science utilization of the machine Maximize
More informationUpdate of Post-K Development Yutaka Ishikawa RIKEN AICS
Update of Post-K Development Yutaka Ishikawa RIKEN AICS 11:20AM 11:40AM, 2 nd of November, 2017 FLAGSHIP2020 Project Missions Building the Japanese national flagship supercomputer, post K, and Developing
More informationFUJITSU PHI Turnkey Solution
FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture
More informationHigh-Performance Lustre with Maximum Data Assurance
High-Performance Lustre with Maximum Data Assurance Silicon Graphics International Corp. 900 North McCarthy Blvd. Milpitas, CA 95035 Disclaimer and Copyright Notice The information presented here is meant
More informationAltair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015
Altair OptiStruct 13.0 Performance Benchmark and Profiling May 2015 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox Compute
More informationCray XC Scalability and the Aries Network Tony Ford
Cray XC Scalability and the Aries Network Tony Ford June 29, 2017 Exascale Scalability Which scalability metrics are important for Exascale? Performance (obviously!) What are the contributing factors?
More informationCSCI 4717 Computer Architecture
CSCI 4717/5717 Computer Architecture Topic: Symmetric Multiprocessors & Clusters Reading: Stallings, Sections 18.1 through 18.4 Classifications of Parallel Processing M. Flynn classified types of parallel
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationMPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA
MPI Optimizations via MXM and FCA for Maximum Performance on LS-DYNA Gilad Shainer 1, Tong Liu 1, Pak Lui 1, Todd Wilde 1 1 Mellanox Technologies Abstract From concept to engineering, and from design to
More informationBlueGene/L (No. 4 in the Latest Top500 List)
BlueGene/L (No. 4 in the Latest Top500 List) first supercomputer in the Blue Gene project architecture. Individual PowerPC 440 processors at 700Mhz Two processors reside in a single chip. Two chips reside
More informationA Global Operating System for HPC Clusters
A Global Operating System Emiliano Betti 1 Marco Cesati 1 Roberto Gioiosa 2 Francesco Piermaria 1 1 System Programming Research Group, University of Rome Tor Vergata 2 BlueGene Software Division, IBM TJ
More informationMotivation. Threads. Multithreaded Server Architecture. Thread of execution. Chapter 4
Motivation Threads Chapter 4 Most modern applications are multithreaded Threads run within application Multiple tasks with the application can be implemented by separate Update display Fetch data Spell
More informationWrite a technical report Present your results Write a workshop/conference paper (optional) Could be a real system, simulation and/or theoretical
Identify a problem Review approaches to the problem Propose a novel approach to the problem Define, design, prototype an implementation to evaluate your approach Could be a real system, simulation and/or
More informationCurrent and Future Challenges of the Tofu Interconnect for Emerging Applications
Current and Future Challenges of the Tofu Interconnect for Emerging Applications Yuichiro Ajima Senior Architect Next Generation Technical Computing Unit Fujitsu Limited June 22, 2017, ExaComm 2017 Workshop
More informationCray RS Programming Environment
Cray RS Programming Environment Gail Alverson Cray Inc. Cray Proprietary Red Storm Red Storm is a supercomputer system leveraging over 10,000 AMD Opteron processors connected by an innovative high speed,
More informationTofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect
Tofu Interconnect 2: System-on-Chip Integration of High-Performance Interconnect Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Shunji Uno, Shinji Sumimoto, Kenichi Miura, Naoyuki Shida, Takahiro Kawashima,
More informationHOKUSAI System. Figure 0-1 System diagram
HOKUSAI System October 11, 2017 Information Systems Division, RIKEN 1.1 System Overview The HOKUSAI system consists of the following key components: - Massively Parallel Computer(GWMPC,BWMPC) - Application
More informationCluster Network Products
Cluster Network Products Cluster interconnects include, among others: Gigabit Ethernet Myrinet Quadrics InfiniBand 1 Interconnects in Top500 list 11/2009 2 Interconnects in Top500 list 11/2008 3 Cluster
More informationIntel Cluster Toolkit Compiler Edition 3.2 for Linux* or Windows HPC Server 2008*
Intel Cluster Toolkit Compiler Edition. for Linux* or Windows HPC Server 8* Product Overview High-performance scaling to thousands of processors. Performance leadership Intel software development products
More informationDell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance This Dell EMC technical white paper discusses performance benchmarking results and analysis for Simulia
More informationBest Practices for Setting BIOS Parameters for Performance
White Paper Best Practices for Setting BIOS Parameters for Performance Cisco UCS E5-based M3 Servers May 2013 2014 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public. Page
More informationIntel Xeon Phi архитектура, модели программирования, оптимизация.
Нижний Новгород, 2017 Intel Xeon Phi архитектура, модели программирования, оптимизация. Дмитрий Прохоров, Дмитрий Рябцев, Intel Agenda What and Why Intel Xeon Phi Top 500 insights, roadmap, architecture
More informationI/O Handling. ECE 650 Systems Programming & Engineering Duke University, Spring Based on Operating Systems Concepts, Silberschatz Chapter 13
I/O Handling ECE 650 Systems Programming & Engineering Duke University, Spring 2018 Based on Operating Systems Concepts, Silberschatz Chapter 13 Input/Output (I/O) Typical application flow consists of
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction A set of general purpose processors is connected together.
More informationXtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin
XtreemFS a case for object-based storage in Grid data management Jan Stender, Zuse Institute Berlin In this talk... Traditional Grid Data Management Object-based file systems XtreemFS Grid use cases for
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures M. Bayatpour, S. Chakraborty, H. Subramoni, X. Lu, and D. K. Panda Department of Computer Science and Engineering
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More informationHow to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries
How to Boost the Performance of Your MPI and PGAS s with MVAPICH2 Libraries A Tutorial at the MVAPICH User Group (MUG) Meeting 18 by The MVAPICH Team The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationGPUfs: Integrating a file system with GPUs
GPUfs: Integrating a file system with GPUs Mark Silberstein (UT Austin/Technion) Bryan Ford (Yale), Idit Keidar (Technion) Emmett Witchel (UT Austin) 1 Traditional System Architecture Applications OS CPU
More informationIBM Blue Gene/Q solution
IBM Blue Gene/Q solution Pascal Vezolle vezolle@fr.ibm.com Broad IBM Technical Computing portfolio Hardware Blue Gene/Q Power Systems 86 Systems idataplex and Intelligent Cluster GPGPU / Intel MIC PureFlexSystems
More informationNVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL)
NVIDIA COLLECTIVE COMMUNICATION LIBRARY (NCCL) RN-08645-000_v01 September 2018 Release Notes TABLE OF CONTENTS Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter Chapter 1. NCCL Overview...1
More informationParallel Architectures
Parallel Architectures Part 1: The rise of parallel machines Intel Core i7 4 CPU cores 2 hardware thread per core (8 cores ) Lab Cluster Intel Xeon 4/10/16/18 CPU cores 2 hardware thread per core (8/20/32/36
More informationImproving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload
Improving Altibase Performance with Solarflare 10GbE Server Adapters and OpenOnload Summary As today s corporations process more and more data, the business ramifications of faster and more resilient database
More informationImprove Web Application Performance with Zend Platform
Improve Web Application Performance with Zend Platform Shahar Evron Zend Sr. PHP Specialist Copyright 2007, Zend Technologies Inc. Agenda Benchmark Setup Comprehensive Performance Multilayered Caching
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationMAHA. - Supercomputing System for Bioinformatics
MAHA - Supercomputing System for Bioinformatics - 2013.01.29 Outline 1. MAHA HW 2. MAHA SW 3. MAHA Storage System 2 ETRI HPC R&D Area - Overview Research area Computing HW MAHA System HW - Rpeak : 0.3
More informationFeedback on BeeGFS. A Parallel File System for High Performance Computing
Feedback on BeeGFS A Parallel File System for High Performance Computing Philippe Dos Santos et Georges Raseev FR 2764 Fédération de Recherche LUmière MATière December 13 2016 LOGO CNRS LOGO IO December
More informationClusters. Rob Kunz and Justin Watson. Penn State Applied Research Laboratory
Clusters Rob Kunz and Justin Watson Penn State Applied Research Laboratory rfk102@psu.edu Contents Beowulf Cluster History Hardware Elements Networking Software Performance & Scalability Infrastructure
More informationHIGH PERFORMANCE COMPUTING FROM SUN
HIGH PERFORMANCE COMPUTING FROM SUN Update for IDC HPC User Forum, Norfolk, VA April 2008 Bjorn Andersson Director, HPC and Integrated Systems Sun Microsystems Sun Constellation System Integrating the
More informationDesigning Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationLecture 9: MIMD Architectures
Lecture 9: MIMD Architectures Introduction and classification Symmetric multiprocessors NUMA architecture Clusters Zebo Peng, IDA, LiTH 1 Introduction MIMD: a set of general purpose processors is connected
More informationHigh Performance Computing Systems
High Performance Computing Systems Multikernels Doug Shook Multikernels Two predominant approaches to OS: Full weight kernel Lightweight kernel Why not both? How does implementation affect usage and performance?
More informationPorting Applications to Blue Gene/P
Porting Applications to Blue Gene/P Dr. Christoph Pospiech pospiech@de.ibm.com 05/17/2010 Agenda What beast is this? Compile - link go! MPI subtleties Help! It doesn't work (the way I want)! Blue Gene/P
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationXyratex ClusterStor6000 & OneStor
Xyratex ClusterStor6000 & OneStor Proseminar Ein-/Ausgabe Stand der Wissenschaft von Tim Reimer Structure OneStor OneStorSP OneStorAP ''Green'' Advancements ClusterStor6000 About Scale-Out Storage Architecture
More informationSNAP Performance Benchmark and Profiling. April 2014
SNAP Performance Benchmark and Profiling April 2014 Note The following research was performed under the HPC Advisory Council activities Participating vendors: HP, Mellanox For more information on the supporting
More informationCompiler Technology That Demonstrates Ability of the K computer
ompiler echnology hat Demonstrates Ability of the K computer Koutarou aki Manabu Matsuyama Hitoshi Murai Kazuo Minami We developed SAR64 VIIIfx, a new U for constructing a huge computing system on a scale
More informationMIMD Overview. Intel Paragon XP/S Overview. XP/S Usage. XP/S Nodes and Interconnection. ! Distributed-memory MIMD multicomputer
MIMD Overview Intel Paragon XP/S Overview! MIMDs in the 1980s and 1990s! Distributed-memory multicomputers! Intel Paragon XP/S! Thinking Machines CM-5! IBM SP2! Distributed-memory multicomputers with hardware
More informationIntroduction to parallel computers and parallel programming. Introduction to parallel computersand parallel programming p. 1
Introduction to parallel computers and parallel programming Introduction to parallel computersand parallel programming p. 1 Content A quick overview of morden parallel hardware Parallelism within a chip
More informationHPC Architectures. Types of resource currently in use
HPC Architectures Types of resource currently in use Reusing this material This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 4.0 International License. http://creativecommons.org/licenses/by-nc-sa/4.0/deed.en_us
More informationExperiences with HP SFS / Lustre in HPC Production
Experiences with HP SFS / Lustre in HPC Production Computing Centre (SSCK) University of Karlsruhe Laifer@rz.uni-karlsruhe.de page 1 Outline» What is HP StorageWorks Scalable File Share (HP SFS)? A Lustre
More informationOperating Systems Overview. Chapter 2
1 Operating Systems Overview 2 Chapter 2 3 An operating System: The interface between hardware and the user From the user s perspective: OS is a program that controls the execution of application programs
More informationMPI Performance Snapshot. User's Guide
MPI Performance Snapshot User's Guide MPI Performance Snapshot User s Guide Legal Information No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by
More informationAdvanced Message-Passing Interface (MPI)
Outline of the workshop 2 Advanced Message-Passing Interface (MPI) Bart Oldeman, Calcul Québec McGill HPC Bart.Oldeman@mcgill.ca Morning: Advanced MPI Revision More on Collectives More on Point-to-Point
More informationDELL EMC ISILON F800 AND H600 I/O PERFORMANCE
DELL EMC ISILON F800 AND H600 I/O PERFORMANCE ABSTRACT This white paper provides F800 and H600 performance data. It is intended for performance-minded administrators of large compute clusters that access
More informationBei Wang, Dmitry Prohorov and Carlos Rosales
Bei Wang, Dmitry Prohorov and Carlos Rosales Aspects of Application Performance What are the Aspects of Performance Intel Hardware Features Omni-Path Architecture MCDRAM 3D XPoint Many-core Xeon Phi AVX-512
More informationFujitsu High Performance CPU for the Post-K Computer
Fujitsu High Performance CPU for the Post-K Computer August 21 st, 2018 Toshio Yoshida FUJITSU LIMITED 0 Key Message A64FX is the new Fujitsu-designed Arm processor It is used in the post-k computer A64FX
More informationApplication Performance on IME
Application Performance on IME Toine Beckers, DDN Marco Grossi, ICHEC Burst Buffer Designs Introduce fast buffer layer Layer between memory and persistent storage Pre-stage application data Buffer writes
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 3: Programming Models Piccolo: Building Fast, Distributed Programs
More informationHarp-DAAL for High Performance Big Data Computing
Harp-DAAL for High Performance Big Data Computing Large-scale data analytics is revolutionizing many business and scientific domains. Easy-touse scalable parallel techniques are necessary to process big
More informationUsing Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology
Using Industry Standards to Exploit the Advantages and Resolve the Challenges of Multicore Technology September 19, 2007 Markus Levy, EEMBC and Multicore Association Enabling the Multicore Ecosystem Multicore
More information