MAHA. - Supercomputing System for Bioinformatics

Similar documents
Dell EMC Ready Bundle for HPC Digital Manufacturing Dassault Systѐmes Simulia Abaqus Performance

Isilon Performance. Name

Feedback on BeeGFS. A Parallel File System for High Performance Computing

Dell EMC Ready Bundle for HPC Digital Manufacturing ANSYS Performance

FUJITSU PHI Turnkey Solution

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

Emerging Technologies for HPC Storage

Results from TSUBAME3.0 A 47 AI- PFLOPS System for HPC & AI Convergence

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Overview of Tianhe-2

Extremely Fast Distributed Storage for Cloud Service Providers

SFA12KX and Lustre Update

Virtualization of the MS Exchange Server Environment

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

HIGH-PERFORMANCE STORAGE FOR DISCOVERY THAT SOARS

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

Computer Science Section. Computational and Information Systems Laboratory National Center for Atmospheric Research

GROMACS (GPU) Performance Benchmark and Profiling. February 2016

HP GTC Presentation May 2012

NVM Express over Fabrics Storage Solutions for Real-time Analytics

Inspur AI Computing Platform

Coordinating Parallel HSM in Object-based Cluster Filesystems

InfiniBand Networked Flash Storage

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Parallel File Systems for HPC

Intel Select Solutions for Professional Visualization with Advantech Servers & Appliances

The way toward peta-flops

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

New Approach to Unstructured Data

Datura The new HPC-Plant at Albert Einstein Institute

SGI Overview. HPC User Forum Dearborn, Michigan September 17 th, 2012

Enterprise Network Compute System (ENCS)

Onto Petaflops with Kubernetes

DESCRIPTION GHz, 1.536TB shared memory RAM, and 20.48TB RAW internal storage teraflops About ScaleMP

HPC Storage Use Cases & Future Trends

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

STAR-CCM+ Performance Benchmark and Profiling. July 2014

Optimizing Local File Accesses for FUSE-Based Distributed Storage

represent parallel computers, so distributed systems such as Does not consider storage or I/O issues

Assessing performance in HP LeftHand SANs

HPE Scalable Storage with Intel Enterprise Edition for Lustre*

Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA

CAS 2K13 Sept Jean-Pierre Panziera Chief Technology Director

Agenda. Sun s x Sun s x86 Strategy. 2. Sun s x86 Product Portfolio. 3. Virtualization < 1 >

HPC Innovation Lab Update. Dell EMC HPC Community Meeting 3/28/2017

Building a High IOPS Flash Array: A Software-Defined Approach

SuperMike-II Launch Workshop. System Overview and Allocations

Data center requirements

An Overview of Fujitsu s Lustre Based File System

S THE MAKING OF DGX SATURNV: BREAKING THE BARRIERS TO AI SCALE. Presenter: Louis Capps, Solution Architect, NVIDIA,

Game-changing Extreme GPU computing with The Dell PowerEdge C4130

Atrato SOLVE - Scalable Offload Logical Volume Engine

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

2012 HPC Advisory Council

Data life cycle monitoring using RoBinHood at scale. Gabriele Paciucci Solution Architect Bruno Faccini Senior Support Engineer September LAD

2014 VMware Inc. All rights reserved.

SSG-6028R-E1CR16T SSG-6028R-E1CR24N/L SSG-2028R-E1CR48N/L SSG-6048R-E1CR60N/L

Design and Evaluation of a 2048 Core Cluster System

MegaGauss (MGs) Cluster Design Overview

Diamond Networks/Computing. Nick Rees January 2011

RAIDIX 4.5. Product Features. Document revision 1.0

Mission-Critical Lustre at Santos. Adam Fox, Lustre User Group 2016

Dell EMC HPC System for Life Sciences v1.4

HPC Hardware Overview

Exactly as much as you need.

HPC Architectures. Types of resource currently in use

The BioHPC Nucleus Cluster & Future Developments

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

HCI: Hyper-Converged Infrastructure

EMC & VMWARE STRATEGIC FORUM NEW YORK MARCH David Goulden President & COO. Copyright 2013 EMC Corporation. All rights reserved.

Density Optimized System Enabling Next-Gen Performance

Design a Remote-Office or Branch-Office Data Center with Cisco UCS Mini

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Xcellis Technical Overview: A deep dive into the latest hardware designed for StorNext 5

Database Services at CERN with Oracle 10g RAC and ASM on Commodity HW

LAMMPS-KOKKOS Performance Benchmark and Profiling. September 2015

LCE: Lustre at CEA. Stéphane Thiell CEA/DAM

Understanding Write Behaviors of Storage Backends in Ceph Object Store

Create a Flexible, Scalable High-Performance Storage Cluster with WekaIO Matrix

IBM InfoSphere Streams v4.0 Performance Best Practices

Microsoft Exchange Server 2010 workload optimization on the new IBM PureFlex System

Architecting High Performance Computing Systems for Fault Tolerance and Reliability

The Last Bottleneck: How Parallel I/O can improve application performance

The Architecture and the Application Performance of the Earth Simulator

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

DDN About Us Solving Large Enterprise and Web Scale Challenges

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Refining and redefining HPC storage

PureSystems: Changing The Economics And Experience Of IT

NexentaVSA for View. Hardware Configuration Reference nv4v-v A

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

ABySS Performance Benchmark and Profiling. May 2010

CS500 SMARTER CLUSTER SUPERCOMPUTERS

Lustre at the OLCF: Experiences and Path Forward. Galen M. Shipman Group Leader Technology Integration

Scaling Across the Supercomputer Performance Spectrum

Scaling to Petaflop. Ola Torudbakken Distinguished Engineer. Sun Microsystems, Inc

Algorithms and Data Structures for Efficient Free Space Reclamation in WAFL

Transcription:

MAHA - Supercomputing System for Bioinformatics - 2013.01.29

Outline 1. MAHA HW 2. MAHA SW 3. MAHA Storage System 2

ETRI HPC R&D Area - Overview Research area Computing HW MAHA System HW - Rpeak : 0.3 PetaFLOPS @ 2015. GPGPU / MIC Based. GPGPU/MIC/Memory nodes - High-speed/Low-latency Network. Management : Ethernet 1Gbps. Computational : InfiniBand 40Gbps. Computational : PCIeLink 128/256 Gbps I/O SSD+HDD File System SW - Max. 700Gbps / 1M IOPS. 40 SSD storage servers (equal to 600 HDD servers) - Power Saving. Dynamic power control on unused HDD/servers (Speed down/sleep/power off according to access rate) <MAHA System Architecture> <MAHA System Layout - Plan> Bio Application Parallelized genome analysis SW - Parallelized genome analysis pipeline. Optimized genome indexing. Parallelized sequence mapping. Parallelized SNP extraction and analysis. Visualization Protein folding analysis SW - 3 dimensional protein mapping - Protein docking analysis and DB Sysyem SW Bio workflow mgmt. - HPC environ. supporting bio workflows. Ease of use Heterogeneous resource mgmt. - Resource mgmt. for bio applications. Performance improvement Integrated cluster mgmt. - Single point of mgmt. for MAHA system. Simplified deployment & mgmt. 3

MAHA Supercomputing System (Jan., 2013) Heterogeneous supercomputer 104 TeraFLOPS with CPU and accelerators (GPGPU, MIC) 53.2 TeraFLOPS compute node based on GPGPU (M2090) 51.3 TeraFLOPS compute node based on MIC (Xeon Phi) Number of cores : over 36,000 cores 54 Compute nodes, 3 Management Nodes, 19 Storage nodes 194 TeraByte of storage (SSD= 34 TB, HDD= 160 TB) CPU 166.4 GFLOPS/CPU Intel Xeon @ E5 8 Cores@2.6GHz GPGPU 665 GFLOPS/GPGPU NVIDIA @ Fermi 512Cores@1.3GHz Node 1.67 TFLOPS Dual CPU, Dual GPGPU 32 GB Memory Subrack 8.3 TFLOPS 5 Nodes 160 GB Memory Subrack 11.5 TFLOPS 5 Nodes 160 GB Memory Node 2.3 TFLOPS Dual CPU, Dual Phi 32 GB Memory Xeon @ Phi 1 TFLOPS/MIC Intel @ Xeon @ Phi TM > 50 Cores CPU 166.4 GFLOPS/CPU Intel @ Xeon @ E5 8 Cores@2.6GHz 4

MAHA Supercomputing System (Jan., 2013) Network 10/1 Gbps Mgmt. Network 1G / 10G Ethernt for System Management 40 Gbps Computational Network 40 Gbps QDR InifiniBand for Computational Network SSD Storage Server MotherBoard RAID Controller 4 SATA2 port Backplane SSD Management Node PCI-E x4 3Gbps SATA port connection SSD SSD SSD Dual CPU (Intel @ Xeon @ E5) 332 Gigaflops 1 User Login Node 2 Management node Accelerated Compute Node Dual CPU (Intel @ Xeon @ E5) 332 Gigaflops 32 GB memory Dual GPGPU (NVIDIA @ M2090) 1,330 Gigaflops Dual MIC (Intel @ Xeon @ Phi) > 2,000 Gigaflops MAHA Supercomputing System (100 TeraFLOPS) MAID Storage Server MotherBoard PCI-E x4 RAID Controller 16 SATA2 port 3Gbps SATA port connection Backplane HDD 5

MAHA Supercomputing System Performance (Jan., 2012) Hybrid HPL (High Performance Linpack) Average R max = 29.9 TeraFLOPS System Efficiency * = 56.2% Avg. : 29.901 TFs Max : 30.310 TFs 6

MAHA Supercomputing Facility MAHA Supercomputing facility Server room : 42 m 2 Hybrid cooling : Cold outside air or internal air conditionig MAHA system MAHA system 7

MAHA Supercomputing Roadmap MAHA Supercomputing System 200 TeraFLOPS in 2013 In year 2015, MAHA will reach 300 TeraFLOPS (R peak ) 2011 2012 2013 2014 2015 50 TFs 100 TFs 200 TFs 250 TFs 300 TFs 8

Outline 1. MAHA HW 2. MAHA SW 3. MAHA Storage System 9

MAHA System Workplace: Objective HPC software solution specially designed for bioinformatics applications For end users (especially in the field of bioinformatics) User-friendly HPC environment supporting bio workflows End users can easily define workflows of bio applications and then efficiently execute them in HPC systems For system administrator Integrated cluster management tool * MIC: Product based on Intel Many Integrated Core architecture 10

MAHA System Workplace: Features & Benefits Features User-friendly HPC environment supporting bio workflow Easy configuration for execution with the aid of workflow analysis Workflow transformation for efficient execution in HPC environment Performance improvement through the support for execution of bio applications Single point of management for MAHA system & services Benefits For end users Easy to use even for non-experts Improved performance For system administrators Simplified deployment & management 11

MAHA System Workplace: Function (1/3) Bio Workflow Management for HPC Environment Bio workflow definition & execution management XML-based workflow model Web UI-based workflow lifecycle management Bio workflow execution engine Transform a user-defined workflow to multiple HPC jobs Cooperate with the resource management software Bio workflow analysis tool Help to find out the characteristics and resource requirements 12

MAHA System Workplace: Function (2/3) Resource Management for Bio Applications Job scheduling & resource allocation End user s view: process a workflow as fast as possible System s view: process as many workflows as possible in a given time Support for execution of bio applications Solve performance problems by analyzing the characteristics of bio applications 13

MAHA System Workplace: Function (3/3) Integrated Cluster Management for MAHA System & Services Provisioning management Cluster operation management Monitoring management Service (including MAHA System Workplace) management Web UI for MAHA System Workplace 14

Outline 1. MAHA HW 2. MAHA SW 3. MAHA Storage System 15

Objective of the MAHA-FS Distributed File System for HPC application, specially for Genome Analysis Upgrading the performance of the GLORY-FS (developed by ETRI) Supporting competitive performance to the Lustre (700Gbps, 1 Million IOPS) Compatibility with existing various genome analysis applications 16

Features and Benefits of the MAHA-FS Features Hybrid Storage Support for high performance/cost with SSD (700Gbps, 1 million IOPS) Support for high capacity/cost with HDD (more than petabytes) Low power consumption Reduce storage power consumption with cutting off un-accessed HDD (up to 50%) Advantage For genome analyst Reduce TCO for large scale storage No need to modify their exist genome analysis applications For administrators Easy management for peta-scale storage 17

Performance & Capacity Considerations for NGS workload Peak 933 MB/s is required for 1 human genome analysis (I/O speed equivalent to 10 SATA HDD) Computing Node 3d 14h 5m 32s Align Sample Sort merge mpileup Peak: 681.63 MB/s Avg: 41.82 MB/s Peak: 933.50 MB/s Avg: 332.83 MB/s Peak: 68 MB/s Avg: 19.6 MB/s Peak: 52.25 MB/s Avg: 11.63 MB/s Peak: 76.38 MB/s Avg: 2.93 MB/s Reference Genome (11 GB) Source Data (218 GB) Temporary Data (96 GB) Intermediate Result Data (?? GB) Final Result Data (819 GB) Shared Storage (MAHA-FS) Total 1.2 TB capacity is required for 1 human genome analysis NGS: Next Generation Sequencing 18

Storage Architecture Consideration for NGS workload MAHA-FS Architecture MAHA-FS Metadata Server on Commodity Parts - Server, Chassis - RAID controller - 1G/10G/40G NIC - SATA SSD/HDD MAHA-FS Clients (Compute Nodes) 1/10G/IB Fabric x86 based storage server and SATA HDD for Lower TCO MAHA-FS Storage Servers on Commodity Parts - Server, Chassis - RAID controller - 1G/10G/40G NIC - SATA SSD/HDD and/or and/or SATASSD SATA HDD SATA SSD SATA HDD Resiliency supported by replication/migration built in the MAHA-FS SW Existing HPC Architecture (Lustre) Lustre Clients (Compute Nodes) Lustre Metadata Server Metadata Storage Array - NetApp E2624 1/10G/IB Fabric Lustre Storage Servers (Active/Active) Data Storage Arrays - NetApp E5460/DE6600 - NetApp E5424/DE5600 SAN (Fiber Channel) and/or SAN (Fiber Channel) and/or External Data Storage Array and the Fiber Channel fabric is the main cause of the High Cost SAS HDD SAS HDD Resiliency supported by redundant server and storage hardware (No support for resiliency within the Lustre) From NetApp 19

MAHA Storage H/W Test-bed Status (2012) Commodity SSD Storage Server MotherBoard RAID Controller * 3 (8 SATA2 port) Backplane SSD PCI-E x4 3Gbps SATA port connection SSD SSD 10G Ethernet/40G Infiniband (VPI Adapter) Total Capacity Built: 34 TB - # SATA SSDs: 192 ea - # Servers: 10 ea Total Capacity Planned: 85 TB (2015) Commodity HDD Storage Server Backplane MotherBoard PCI-E x4 RAID Controller * 1 (16 SATA2 port) 3Gbps SATA port connection HDD Total Capacity Built: 160 TB - # SATA HDDs: 160 ea - # Servers: 9 ea Total Capacity Planned: 400 TB (2015) 2011 2012 10G Ethernet/40G Infiniband (VPI Adapter) 20

MAHA File System SW Architecture Light-weight Metadata Access Protocol (NFS-like protocol) User-level File System No kernel patch/dependency User Kernel App FUSE Low Level VFS & Cache MAHA-FS Client FS Client LMD I/F Hybrid I/O /dev/fuse MAHA-FS Metadata Server MDS SRV Metadata Server Core MySQL NMD Management Protocol DS Clnt I/F Linux Kernel Participation Heartbeat MySQL NMD /proc Light-weight Metadata Engine (NMD) Berkeley DB-like Engine optimized for file system metadata (10 times faster) FUSE kernel module Linux Client MAHA-FS Data Server Data Server Core Hybrid I/O Protocol MDS Clnt I/F Hybrid I/O FS I/F PROC I/F Dynamic selection between two I/O protocol based on the workload characteristic - Sequential I/O optimized Protocol - Random I/O optimized Protocol ext4 Linux Kernel ext4 /proc 21

Overall Performance of the MAHA-FS v1.0 Overall Performance Results (Jan. 2013) - Aggregate Sequential I/O: > 70Gbps, Aggregate Random I/O: > 20 Million IOPS - Metadata Performance: > 100,000 open/sec, > 50,000 create/sec Target Metric (2012) Target Metric (2012) 22

Micro-benchmark Results of the MAHA-FS v1.0 Metadata Performance (1 st Result, Sep. 2012) - About 4 times faster than the Lustre (but need more careful examination) (File Creation: 52,437 ops, File Open: 116,005 ops) File creation performance (ops/sec) File open performance (ops/sec) 3.4 times better 4.6 times better 15,000 (measured) 25,000 (announced) 9,000 (measured) Looks faster, but needs more look 23

Micro-benchmark Results of the MAHA-FS v1.0 Data I/O Performance (Sep. 2012, Jan. 2013) - Still struggling to achieve better performance Additional Tuning and Testing is on going. 24

NGS pipeline benchmark Results of the MAHA-FS v1.0 NGS-pl benchmark Results (1 st Result, Dec. 2012) - Slightly faster than the Lustre, but slower then the NFS NGS pipeline analysis applications NGS1 NGS2 NGS3 NGS4 NGS5 NGS6 NFS1 NFS2 Isolated two NFS server NGS1 NGS2 NGS3 NGS4 NGS5 NGS6 DS1 DS2 Parallel/Distributed File server (Lustre/MAHA-FS) Benchmark Storage Environment Comparison of file systems for NGS-pl workload Just the 1 st result. No more, no less.. (hour) 25

Thank You 26