FUJITSU PHI Turnkey Solution

Similar documents
LBRN - HPC systems : CCT, LSU

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

High Performance Computing with Fujitsu

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Overview of Tianhe-2

in Action Fujitsu High Performance Computing Ecosystem Human Centric Innovation Innovation Flexibility Simplicity

MAHA. - Supercomputing System for Bioinformatics

Technical Computing Suite supporting the hybrid system

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

HPC Architectures. Types of resource currently in use

Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

NAMD Performance Benchmark and Profiling. January 2015

How to run applications on Aziz supercomputer. Mohammad Rafi System Administrator Fujitsu Technology Solutions

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters

Accelerating HPC. (Nash) Dr. Avinash Palaniswamy High Performance Computing Data Center Group Marketing

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

IBM CORAL HPC System Solution

Increasing the efficiency of your GPU-enabled cluster with rcuda. Federico Silla Technical University of Valencia Spain

High Performance Computing and Data Resources at SDSC

Programming for Fujitsu Supercomputers

LS-DYNA Best-Practices: Networking, MPI and Parallel File System Effect on LS-DYNA Performance

Pedraforca: a First ARM + GPU Cluster for HPC

Erkenntnisse aus aktuellen Performance- Messungen mit LS-DYNA

Japan s post K Computer Yutaka Ishikawa Project Leader RIKEN AICS

GROMACS Performance Benchmark and Profiling. August 2011

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Advanced Software for the Supercomputer PRIMEHPC FX10. Copyright 2011 FUJITSU LIMITED

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

Basic Specification of Oakforest-PACS

CPMD Performance Benchmark and Profiling. February 2014

ABySS Performance Benchmark and Profiling. May 2010

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

MILC Performance Benchmark and Profiling. April 2013

STAR-CCM+ Performance Benchmark and Profiling. July 2014

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

Company. Intellectual Property. Headquartered in the Silicon Valley

The Why and How of HPC-Cloud Hybrids with OpenStack

To Infiniband or Not Infiniband, One Site s s Perspective. Steve Woods MCNC

LS-DYNA Performance Benchmark and Profiling. October 2017

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

HPC Solution. Technology for a New Era in Computing

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

SUN CUSTOMER READY HPC CLUSTER: REFERENCE CONFIGURATIONS WITH SUN FIRE X4100, X4200, AND X4600 SERVERS Jeff Lu, Systems Group Sun BluePrints OnLine

LS-DYNA Performance Benchmark and Profiling. October 2017

Performance analysis tools: Intel VTuneTM Amplifier and Advisor. Dr. Luigi Iapichino

AcuSolve Performance Benchmark and Profiling. October 2011

Introduction to Xeon Phi. Bill Barth January 11, 2013

IBM Power Systems HPC Cluster

Intel Xeon Phi архитектура, модели программирования, оптимизация.

LS-DYNA Performance Benchmark and Profiling. April 2015

Illinois Proposal Considerations Greg Bauer

Deploying remote GPU virtualization with rcuda. Federico Silla Technical University of Valencia Spain

Exercise Architecture of Parallel Computer Systems

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

SNAP Performance Benchmark and Profiling. April 2014

Vincent C. Betro, R. Glenn Brook, & Ryan C. Hulguin XSEDE Xtreme Scaling Workshop Chicago, IL July 15-16, 2012

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Open Fabrics Workshop 2013

rcuda: towards energy-efficiency in GPU computing by leveraging low-power processors and InfiniBand interconnects

OCTOPUS Performance Benchmark and Profiling. June 2015

Smarter Clusters from the Supercomputer Experts

PORTING CP2K TO THE INTEL XEON PHI. ARCHER Technical Forum, Wed 30 th July Iain Bethune

Interconnect Your Future

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

RDMA in Embedded Fabrics

HYCOM Performance Benchmark and Profiling

SuperMike-II Launch Workshop. System Overview and Allocations

Parallel & Cluster Computing. cs 6260 professor: elise de doncker by: lina hussein

The Stampede is Coming: A New Petascale Resource for the Open Science Community

Munara Tolubaeva Technical Consulting Engineer. 3D XPoint is a trademark of Intel Corporation in the U.S. and/or other countries.

Introduction to Abel/Colossus and the queuing system

DATARMOR: Comment s'y préparer? Tina Odaka

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

The rcuda technology: an inexpensive way to improve the performance of GPU-based clusters Federico Silla

Comet Virtualization Code & Design Sprint

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

HPMMAP: Lightweight Memory Management for Commodity Operating Systems. University of Pittsburgh

Intel High-Performance Computing. Technologies for Engineering

Headline in Arial Bold 30pt. SGI Altix XE Server ANSYS Microsoft Windows Compute Cluster Server 2003

Tools for Intel Xeon Phi: VTune & Advisor Dr. Fabio Baruffa - LRZ,

Storage for HPC, HPDA and Machine Learning (ML)

NEMO Performance Benchmark and Profiling. May 2011

Delivering HPC Performance at Scale

An ESS implementation in a Tier 1 HPC Centre

2008 International ANSYS Conference

Architecture, Programming and Performance of MIC Phi Coprocessor

RHRK-Seminar. High Performance Computing with the Cluster Elwetritsch - II. Course instructor : Dr. Josef Schüle, RHRK

Cisco SFS 7000D InfiniBand Server Switch

The Stampede is Coming Welcome to Stampede Introductory Training. Dan Stanzione Texas Advanced Computing Center

Analysis of Characteristic Features of HPC Applications. Lian Jin HPC Engineer, Inspur

Debugging CUDA Applications with Allinea DDT. Ian Lumb Sr. Systems Engineer, Allinea Software Inc.

Using SDSC Systems (part 2)

Performance Optimizations for LS-DYNA with Mellanox HPC-X Scalable Software Toolkit

Dell Storage NX Windows NAS Series Configuration Guide

Designed for Maximum Accelerator Performance

Transcription:

FUJITSU PHI Turnkey Solution Integrated ready to use XEON-PHI based platform Dr. Pierre Lagier ISC2014 - Leipzig

PHI Turnkey Solution challenges System performance challenges Parallel IO best architecture design and fine system tuning, includes the integration of SSDs technology Known bottlenecks on application performance, like PCI bus and host sockets relationship Application challenges Hybrid programming model and related performance issues (latency, synchronization overhead, MPI sustained bandwidth between PHI boards) must be addressed in parallel with system performance challenges How end users will benefit from using a Web portal (PRIMERGY Gateway) to hide the heterogeneity and related issues? Environment challenges Full integration of all software components with the Cluster Deployment Manager tool.

FUJITSU PHI CLUSTER

GigaBit Switch TrueScale Switch The PHI Cluster RX350/S7 Network Parallel File System User files

GigaBit Switch TrueScale Switch PHI Cluster: Network RX350/S7 Network Ethernet and IB Network Single GigaBit switch Same shared subnet bridging XEON-PHI and all compute nodes Only one cluster of heterogeneous compute nodes, XEON-PHI and Dual Ivy-Bridge TrueScale IB switch Dual rail IB per compute node

PHI Cluster: File Systems RX350/S7 Efficiency Driven IOs HOME file system On login node NFS mounted on the compute nodes Local scratch On each SB node Mounted on connected XP node Parallel file system (FHGFS) Integrated to SB nodes with one Intel SSD per node (3TB global storage capacity over 8 nodes minimal configuration) Each SB node is MDS/DS/Client Each XP node is client Simple policies: local prefered MDS and DS from client, striping factor of 8 over all SB nodes 6

FUJTISU SOFTWARE STACK

FUJITSU HPC SW Stack A mature software stack includes specific software for: Deploying nodes and managing software packages A workload manager for job and resource management Parallel execution environment with libraries Tools for application development (as needed) Storage options (NFS, PFS) These HPC software layers are always the same Variety exists only in the actual components used Parallel Middleware Application programs Parallel environment Workload manager Management of cluster resources Manage serial and parallel jobs Fair share usage between users Automated installation and configuration RedHat Linux OS Drivers Scientific Libraries Cluster deployment and management Administrator interface Operation and monitoring CentOS Fujitsu HPC Cluster Suite Graphical end-user interface User environment management Operating System Compilers, performance and profiling tools Cluster checker GPGPU and XEON Phi software support Parallel File System Fujitsu SW Stack coverage Fujitsu PRIMERGY HPC Clusters

PRIMERGY HPC Gateway Fondation for application solution development PRIMERGY HPC Gateway is the user interface component of the FUJITSU Software HPC Cluster Suite Intuitive web environment incorporating application workflows, direct simulation monitoring, data access and collaboration Value proposition based on simplifying HPC end-use and integrating application expertise, to tune business processes and better manage projects The Gateway delivers additional value by simplifying HPC usage shipping since 05/2013

THE GROMACS PLATFORM

The GROMACS Environment Based on GROMACS 5.0 Verlet cutoff scheme tuned for the xeon-phi Running native on XEON-PHI with Fujitsu OpenMP tuning MPI tuning still progressing Fully integrated to the HPC Gateway Transparent run on front node or XEON-PHI depending of the tools used as wel as the cutoff scheme (verlet on xeon-phi) Integration of key components of GROMACS (grompp, mdrun, ) with form based parameter control Real time display of energies at run time True improvement Protein-Water test case (dhfr) 1.6 times faster on XEON-PHI than dual socket Sandy-Bridge

GROMACS GUI