GPFS for Life Sciences at NERSC

Similar documents
Storage Supporting DOE Science

DVS, GPFS and External Lustre at NERSC How It s Working on Hopper. Tina Butler, Rei Chi Lee, Gregory Butler 05/25/11 CUG 2011

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

HPC Storage Use Cases & Future Trends

Tuning I/O Performance for Data Intensive Computing. Nicholas J. Wright. lbl.gov

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

Genomics on Cisco Metacloud + SwiftStack

DELL EMC ISILON F800 AND H600 I/O PERFORMANCE

Parallel File Systems Compared

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Outline. March 5, 2012 CIRMMT - McGill University 2

Paul Hodge Virtualization Solutions: Improving Efficiency, Availability and Performance

NERSC. National Energy Research Scientific Computing Center

ACCRE High Performance Compute Cluster

Got Isilon? Need IOPS? Get Avere.

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Rethink Storage: The Next Generation Of Scale- Out NAS

OnCommand Cloud Manager 3.2 Deploying and Managing ONTAP Cloud Systems

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

A Container On a Virtual Machine On an HPC? Presentation to HPC Advisory Council. Perth, July 31-Aug 01, 2017

GPFS on a Cray XT. Shane Canon Data Systems Group Leader Lawrence Berkeley National Laboratory CUG 2009 Atlanta, GA May 4, 2009

A GPFS Primer October 2005

Data Movement & Tiering with DMF 7

HPC Growing Pains. IT Lessons Learned from the Biomedical Data Deluge

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

Magellan Project. Jeff Broughton NERSC Systems Department Head October 7, 2009

EMC Celerra CNS with CLARiiON Storage

Emerging Technologies for HPC Storage

Messaging Overview. Introduction. Gen-Z Messaging

Allowing Users to Run Services at the OLCF with Kubernetes

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Execution Models for the Exascale Era

Introducing SUSE Enterprise Storage 5

NERSC Site Update. National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. Richard Gerber

Applying DDN to Machine Learning

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Power of the Portfolio. Copyright 2012 EMC Corporation. All rights reserved.

Introducing Panasas ActiveStor 14

Data storage services at KEK/CRC -- status and plan

Configuring EMC Isilon

FUJITSU PHI Turnkey Solution

An Introduction to GPFS

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

DATA CENTRE SOLUTIONS

DDN About Us Solving Large Enterprise and Web Scale Challenges

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

EMC Isilon. Cisco UCS Director Support for EMC Isilon

NFS, GPFS, PVFS, Lustre Batch-scheduled systems: Clusters, Grids, and Supercomputers Programming paradigm: HPC, MTC, and HTC

Database over NFS. beepy s personal perspective. Brian Pawlowski Vice President and Chief Architect Network Appliance

Stellar performance for a virtualized world

NetApp Clustered Data ONTAP 8.2 Storage QoS Date: June 2013 Author: Tony Palmer, Senior Lab Analyst

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

How Cisco IT Deployed Enterprise Messaging on Cisco UCS

Amazon AWS-Solution-Architect-Associate Exam

Dell EMC Isilon with Cohesity DataProtect

The Fastest Scale-Out NAS

TCO REPORT. NAS File Tiering. Economic advantages of enterprise file management

OBTAINING AN ACCOUNT:

A Thorough Introduction to 64-Bit Aggregates

Application Performance on IME

Isilon Performance. Name

Introduction to Grid Computing

Copyright 2012 EMC Corporation. All rights reserved.

Prof. Konstantinos Krampis Office: Rm. 467F Belfer Research Building Phone: (212) Fax: (212)

Lustre at Scale The LLNL Way

Cornell Red Cloud: Campus-based Hybrid Cloud. Steven Lee Cornell University Center for Advanced Computing

WHITE PAPER: BEST PRACTICES. Sizing and Scalability Recommendations for Symantec Endpoint Protection. Symantec Enterprise Security Solutions Group

An ESS implementation in a Tier 1 HPC Centre

EMC Business Continuity for Microsoft Applications

Filesystem. Disclaimer: some slides are adopted from book authors slides with permission

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

Dell EMC Storage with the Avigilon Control Center System

The Hopper System: How the Largest* XE6 in the World Went From Requirements to Reality! Katie Antypas, Tina Butler, and Jonathan Carter

vsan Remote Office Deployment January 09, 2018

Introduction to Windows Azure Cloud Computing Futures Group, Microsoft Research Roger Barga, Jared Jackson, Nelson Araujo, Dennis Gannon, Wei Lu, and

Benefits of Multi-Node Scale-out Clusters running NetApp Clustered Data ONTAP. Silverton Consulting, Inc. StorInt Briefing

An Overview of Fujitsu s Lustre Based File System

OnCommand Unified Manager 6.1

Enabling a SuperFacility with Software Defined Networking

MyCloud Computing Business computing in the cloud, ready to go in minutes

Catalogic DPX TM 4.3. ECX 2.0 Best Practices for Deployment and Cataloging

Installation and Cluster Deployment Guide for VMware

Infinite Volumes Management Guide

Native vsphere Storage for Remote and Branch Offices

Lossless 10 Gigabit Ethernet: The Unifying Infrastructure for SAN and LAN Consolidation

IBM Active Cloud Engine centralized data protection

SONAS Best Practices and options for CIFS Scalability

Organizational Update: December 2015

Lustre architecture for Riccardo Veraldi for the LCLS IT Team

Cisco Unified Computing System Delivering on Cisco's Unified Computing Vision

Windows Server 2012 Hands- On Camp. Learn What s Hot and New in Windows Server 2012!

BUILDING the VIRtUAL enterprise

Overcoming Obstacles to Petabyte Archives

Transcription:

GPFS for Life Sciences at NERSC A NERSC & JGI collaborative effort Jason Hick, Rei Lee, Ravi Cheema, and Kjiersten Fagnan GPFS User Group meeting May 20, 2015-1 -

Overview of Bioinformatics - 2 -

A High-level Summary Bioenergy Carbon Cycling Biogeochemistry Metagenomes Plants Fungi Microbes DNA Synthesis Science DNA Sequencing Advanced Genomic Technologies Computational Analysis DNA Synthesis - 3 -

Metagenome Analysis Microbial DNA extracted and sequenced Reads (short segments of DNA) generated by sequencer Samples taken from the wild Genome Assembly Community composition Annotation Contigs - 4 -

JGI and the NERSC partnership - 5 -

JGI and NERSC partnership The U.S. DOE s Joint Genome Institute is a leading bioinformatics research facility Good relations with other established bioinformatics facilities, such as WUSTL to receive guidance NERSC accelerates scientific discovery by providing high performance computing and storage as a U.S. DOE user facility Partnered in 2010 to consolidate computing and storage infrastructure at NERSC facility At the time of the partnership, JGI had: Numerous group owned clusters Around 21 NFS-based file systems serving predominantly as archival storage Two Isilon file systems handling sequencing storage, cluster output, and desktop storage - 6 -

Initial assessment Utilization on group clusters was sporadic and increases in sequencing were difficult to translate to computing needs Heterogeneous jobs and predominantly high throughput computing Needed a centralized cluster to provide a scalable solution for making use of additional sequencing Consolidated onto new fair-share cluster called Genepool Pre-existing ethernet interconnect presented challenges Serial workloads preferred Under-provisioned, causing high contention to storage File systems were preventing them from scaling up, 2PB with 1B files Regular hangs and timeouts Bandwidth was too low Metadata performance was low and accounting didn t complete Backups not completing No storage system administrator to help resolve these issues - 7 -

Initial GPFS deployment - 8 -

Match different workloads to different storage systems Retired Netapps filers by migrating data to HPSS archival storage 21 Netapp filers, 7 years or older Users resistant at first, but were surprised at performance! Developed their own data management interface (JAMO) that moves data automatically between file system and archive Introduced new GPFS scratch file system to Genepool cluster Alleviated load on existing Isilon file system allowing us to decide how to use it moving forward Implemented fileset quotas to help JGI balance and manage their storage allocations Implemented new purge policy in combination with archival storage for establishing new user-based data management - 9 -

What a diverse workload Now we re getting to the bottom of it Lots of churn, generating O(100M) KB-sized files and then deleting them We haven t addressed this one yet Especially challenging with use of snapshots and quotas Users requesting 10 s of GBs per second of bandwidth Encouraged use of separate file system who s sole purpose is large file I/O (10 s of GBs per second) Production sequencing very important to not disrupt Peak demand is about 5TB per day Created another file system called seqfs with a very limited number of mounts to nearly exclusively handle sequence machine runs Many desire read-caching for their workload (BLAST) GPFS cache getting blown by writes, algorithm not good for reads Created another file system called dna predominantly read-only mounted - 10 -

Complex software environment The genepool system has over 400 software packages installed Different users require alternative versions of the software The storage problem here is that all users/jobs care how quickly their software loads!

Specific Improvements - 12 -

Implement disaster protection Enabled GPFS snapshots Aid in recovering data from accidental user deletes Backups to HPSS Custom software we call PIBS to perform incremental backups on PB-sized file systems - 13 -

Optimizing GPFS for Ethernet cluster Adjust TCP kernel setting Need Smaller initial send buffers net.ipv4.tcp_wmem net.ipv4.tcp_rmem Prevent head-of-line blocking (saw congestion like symptoms without congestion traffic, result of flow control) - 14 -

GPFS and unsupported kernels They preferred Debian and initially used Debian 5 with GPFS 3.3 This was a bad idea Symptom was high degree/volume of expels Memory errors causing GPFS asserts Switched to Debian 6 with GPFS 3.4 All memory errors ceased Drastically reduced the number of expels - 15 -

GPFS caching Life sciences prefer user space allocations We disabled swap, which was key to preventing out-of-memory problems on their compute cluster Experimented increasing pagepool but didn t help the broader life sciences workload Moved to CNFS approach for better read caching Works for broader set of workload However unknown as to whether this scales for either whole file system, so limiting this to specific subdirectory/fileset of GPFS file system We have different CNFS servers to isolate NFS from native GPFS We would be interested in new options for tuning/using memory in GPFS - 16 -

Moving from Ethernet to IB Genepool has nodes on either ethernet or IB IB expels much less frequent, performance more consistent, but challenges are routing/topology To isolate/scale GPFS separately from compute IB, deploying custom gateway servers routing between compute IB and storage IB Deploying custom gateway servers to route ethernet over storage IB Ethernet flow control/fair share and normal architecture (L1/L2) do not enable GPFS to perform adequately for JGI workloads Detuning stabilizes GPFS for availability (i.e. eliminates expels) but our performance was less than 1GB/sec per compute node, with higher variability in performance - 17 -

Resulting Architecture Today - 18 -

JGI s Compute Cluster at NERSC User Access Command Line Scheduler Service ssh genepool.nersc.gov login nodes http:// jgi-psf.org web services gpint nodes database services compute nodes high priority & interactive nodes fpga /homes /projectb filesystems /seqfs /datanarchive /software - 19 -

JGI Data Storage /projectb 2.6PB SCRATCH/Sandboxes = Wild West Write here from compute jobs Working Directories Web Services WebFS small file system for web servers mounted on gpwebs and in xfer queue 2.6 PB 100TB DnA Project directories, finished products NCBI databases, etc Read-only on compute nodes, read-write in xfer queue Shared Data 1 Pb Sequencer Data 500TB SeqFS 500 TB File system accessible to sequencers at JGI - 20 -

Future Directions - 21 -

Desired root cause analysis of expels Explored using GPFS callbacks to collect data on node expels Ultimately want to determine health of node Gathered information counter on network interfaces, sent IB/ethernet pings However, there still lacks a central method of detecting issues with remote clusters (issue only sent to remote cluster manager) GPFS 4.1 sends notifications of congestion to owning cluster, a major improvement, but still not enough to determine node health - 22 -

Other initiatives underway Scheduler upgrade/enhancements Consider better features for job dependencies Optimizations for high throughput workload HPC Initiatives study Identify changes necessary to enable bioinformatics workload to on the largest HPC systems Workflow software Help manage work external to compute system scheduler Data management tools Evaluating different software for loose coupling of GPFS and HPSS systems (SRM/BeStMan, irods, GPFS Policy Manager, ) Consider small file optimizations File System Acceleration (FSA) using DataDirectNetwork s Infinite Memory Engine (IME) in front of GPFS - 23 -

Summary Life sciences workloads: Predominantly high throughput computing, we consider it data intensive computing Diverse in their demands on file systems Segregating workloads into separate file systems was extremely helpful (latency sensitive to bandwidth demanding, optimizations for reading) Benefit from using archival storage (e.g. HPSS) to improve data management Required special data management software They developed their own solution, called JAMO to move data between archive and file system Drastic availability improvements when shifting to IB over ethernet GPFS works well for the JGI Eager to explore ideas for isolating small file workloads - 24 -