Data storage services at KEK/CRC -- status and plan

Similar documents
System upgrade of KEK central computing system

Workload management at KEK/CRC -- status and plan

Data oriented job submission scheme for the PHENIX user analysis in CCJ

Data oriented job submission scheme for the PHENIX user analysis in CCJ

The INFN Tier1. 1. INFN-CNAF, Italy

High-density Grid storage system optimization at ASGC. Shu-Ting Liao ASGC Operation team ISGC 2011

Data Movement & Tiering with DMF 7

Lustre overview and roadmap to Exascale computing

GPFS Experiences from the Argonne Leadership Computing Facility (ALCF) William (Bill) E. Allcock ALCF Director of Operations

Parallel File Systems. John White Lawrence Berkeley National Lab

CSE 124: Networked Services Fall 2009 Lecture-19

HPC Storage Use Cases & Future Trends

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

The Blue Water s File/Archive System. Data Management Challenges Michelle Butler

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 13 th CALL (T ier-0)

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Constant monitoring of multi-site network connectivity at the Tokyo Tier2 center

PART-I (B) (TECHNICAL SPECIFICATIONS & COMPLIANCE SHEET) Supply and installation of High Performance Computing System

I/O at the Center for Information Services and High Performance Computing

Cluster Setup and Distributed File System

CSE 124: Networked Services Lecture-16

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft. Presented by Manfred Alef Contributions of Jos van Wezel, Andreas Heiss

Optimizing Parallel Access to the BaBar Database System Using CORBA Servers

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 11th CALL (T ier-0)

The Google File System

Lustre2.5 Performance Evaluation: Performance Improvements with Large I/O Patches, Metadata Improvements, and Metadata Scaling with DNE

XRAY Grid TO BE OR NOT TO BE?

Storage Supporting DOE Science

NetApp High-Performance Storage Solution for Lustre

Z RESEARCH, Inc. Commoditizing Supercomputing and Superstorage. Massive Distributed Storage over InfiniBand RDMA

Benoit DELAUNAY Benoit DELAUNAY 1

I Tier-3 di CMS-Italia: stato e prospettive. Hassen Riahi Claudio Grandi Workshop CCR GRID 2011

Isilon Performance. Name

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

<Insert Picture Here> Tape Technologies April 4, 2011

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

DDN s Vision for the Future of Lustre LUG2015 Robert Triendl

and the GridKa mass storage system Jos van Wezel / GridKa

RESEARCH DATA DEPOT AT PURDUE UNIVERSITY

Effizientes Speichern von Cold-Data

DDN About Us Solving Large Enterprise and Web Scale Challenges

NCAR Globally Accessible Data Environment (GLADE) Updated: 15 Feb 2017

Parallel Storage Systems for Large-Scale Machines

Status of KISTI Tier2 Center for ALICE

ARCHER/RDF Overview. How do they fit together? Andy Turner, EPCC

ISILON X-SERIES. Isilon X210. Isilon X410 ARCHITECTURE SPECIFICATION SHEET Dell Inc. or its subsidiaries.

Organizational Update: December 2015

The Google File System

Xyratex ClusterStor6000 & OneStor

Technology Insight Series

University at Buffalo Center for Computational Research

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Analyzing the High Performance Parallel I/O on LRZ HPC systems. Sandra Méndez. HPC Group, LRZ. June 23, 2016

Introduction to High Performance Parallel I/O

<Insert Picture Here> Introducing Oracle WebLogic Server on Oracle Database Appliance

Veritas NetBackup on Cisco UCS S3260 Storage Server

Experiences with HP SFS / Lustre in HPC Production

TGCC OVERVIEW. 13 février 2014 CEA 10 AVRIL 2012 PAGE 1

MOHA: Many-Task Computing Framework on Hadoop

Ivane Javakhishvili Tbilisi State University High Energy Physics Institute HEPI TSU

The Oracle Database Appliance I/O and Performance Architecture

The BioHPC Nucleus Cluster & Future Developments

Crossing the Chasm: Sneaking a parallel file system into Hadoop

HPE MSA 2042 Storage. Data sheet

2012 HPC Advisory Council

Design and Evaluation of a 2048 Core Cluster System

Store Process Analyze Collaborate Archive Cloud The HPC Storage Leader Invent Discover Compete

Storage and I/O requirements of the LHC experiments

An introduction to GPFS Version 3.3

IBM Storage Software Strategy

White paper Version 3.10

TECHNICAL GUIDELINES FOR APPLICANTS TO PRACE 6 th CALL (Tier-0)

EMC ISILON X-SERIES. Specifications. EMC Isilon X210. EMC Isilon X410 ARCHITECTURE

System upgrade and future perspective for the operation of Tokyo Tier2 center. T. Nakamura, T. Mashimo, N. Matsui, H. Sakamoto and I.

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Current Status of the Ceph Based Storage Systems at the RACF

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

dcache tape pool performance Niklas Edmundsson HPC2N, Umeå University

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

High Throughput WAN Data Transfer with Hadoop-based Storage

The Software Defined Online Storage System at the GridKa WLCG Tier-1 Center

An ESS implementation in a Tier 1 HPC Centre

Lustre HSM at Cambridge. Early user experience using Intel Lemur HSM agent

A data Grid testbed environment in Gigabit WAN with HPSS

EMC ISILON X-SERIES. Specifications. EMC Isilon X200. EMC Isilon X400. EMC Isilon X410 ARCHITECTURE

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

The Leading Parallel Cluster File System

Hyper-converged storage for Oracle RAC based on NVMe SSDs and standard x86 servers

The Google File System

REQUEST FOR PROPOSAL FOR PROCUREMENT OF

FlashSystem A9000 / A9000R R Technical Update

Intel Enterprise Edition Lustre (IEEL-2.3) [DNE-1 enabled] on Dell MD Storage

SMART SERVER AND STORAGE SOLUTIONS FOR GROWING BUSINESSES

An Introduction to GPFS

Edinburgh (ECDF) Update

Guillimin HPC Users Meeting February 11, McGill University / Calcul Québec / Compute Canada Montréal, QC Canada

Transcription:

Data storage services at KEK/CRC -- status and plan KEK/CRC Hiroyuki Matsunaga Most of the slides are prepared by Koichi Murakami and Go Iwai

KEKCC System Overview

KEKCC (Central Computing System) The only large cluster system that supports various projects in KEK We also operate supercomputers to support different user communities In operation since April 2012 3.5-year lease system (until Aug. 2015) in the original plan Decided to extend the lease period by one year (until Aug. 2016) Such a large system is fully replaced every 4-5 years by bidding, by following the Japanese government procurement system Migrated to GHI (GPFS-HPSS interface) from VFS/client-API/PFTP Big challenge for HPSS / HSM

CPU Work (interactive) server & Batch server Xeon 5670 (2.93 GHz / 3.33 GHz TB, 6core) 282 nodes : 4GB memory/core 58 nodes : 8GB memory/core 2 CPU/node : 4080 cores Scientific Linux 5 Interconnect InfiniBand 4xQDR (4GB/s), RDMA Connection to storage system Job scheduler LSF (ver. 8 -> ver.9 in last Aug.) Scalability up to 1M jobs Grid deployment EMI CREAM CE Work server also acts as UI, Batch server as WN IBM System x idataplex

Disk DDN SFA10K x 6 racks Capacity : 1152TB x 6 = 6.9 PB (effective), 3TB HDD Throughput: 12 GB/s x 6 Used for GPFS and GHI staging area (3PB each) GPFS file system Parallel file system Total throughput : > 50 GB/s Optimized for massive access number of file servers no bottle-neck interconnect, RDMA-enabled Separation of meta-data area (on SSD) larger block size Performance >500MB/s for single file I/O in benchmark test DDN SFA10000

Tape Tape Library Max. capacity : 16 PB Tape Drive TS1140:60 drives Latest enterprise drive Did not choose LTO because of less reliability Tape Media JC:4TB, 250 MB/s JB :1.6TB (repack), 200 MB/s Tapes are provided by each user/group IBM TS3500 IBM TS1140

HSM (Hierarchical Storage Management) HPSS Disk (first layer) + Tape (second layer) Experience in the previous KEKCC Improvements from the previous system More tape drives Faster I/O speed for tape drive Faster interconnect (10GbE, IB) Performance improvement on staging area (capacity, access speed) Integration with GPFS file system (GHI) GHI (GPFS-HPSS interface) GPFS as staging area Perfect coherence with GPFS access (POSIX I/O) No use of HPSS client API and VFS interface Taking advantage of high performance I/O of GPFS Grid SE and irods access to GHI

Storage Management

Data Management Cycle Different use cases Big Data, different kinds of data sets HEP Material Science Raw data : Experimental data from detectors Written to HSM in real-time (through several buffer stages) High availability is required Cold data (PB to hundreds PB) Reprocessed from time to time Reprocessing frequencies depend on stage of the experiment DST (summary data) : For data analysis Hot data (1 tens of PB) Access from jobs Data-intensive processing High I/O performance required. Hundreds MB/s of I/O, many concurrent accesses from jobs Large amount of Image data taken by detectors A few PB/year expected Service as data archive Easy accessibility rather than I/O performance

Tape Storage Technology is Important We are facing a challenge of Big Data. Hundreds of PB of data is expected in new HEP experiments. Belle II expects 2-3 hundred PB/year at peak luminosity of the SuperKEKB accelerator. Cannot afford a disk-only storage solution. Much less electricity cost for tape storage. On the other hand: Performance, Usability and Long-term Preservation are also very important. Middleware (HSM) in addition to hardware is a key point. GHI, GPFS + HPSS : A Best solution at present Data access with high I/O performance and good usability Same access speed as GPFS once data are staged No need for HPSS client API nor changes in user codes Aggregation of small files helps tape performance a lot

GPFS / HPSS / GHI KEKCC Storage Configuration

System Operations Storage system affects system performance and availability Nearly stable Main focus on performance issues GPFS (massive access, load balance, scalability). File system gets unmounted unexpectedly. Many nodes are affected while recovery. Critical in case of total outage of GPFS. HPSS Almost stable But occasional troubles with staging and data consistency GHI Less stable, or problematic Unclear behavior between GPFS and GHI might cause GPFS instability Staging outage, data loss

History of Data Stored in HPSS For the past 1.5 year, over 5PB

File Size Distribution HPSS stats. Number of files stored in HPSS 142 Million files in HPSS Files under 8MB are aggregated in HTAR. To facilitate tape operation

GRID Setup EMI middleware StoRM is the main SE DPM will retire in the near future. 10Gbps: to remote sites 16Gbps: 1Gbps x 4 x 4 128Gbps: 32Gbps x 4 HSM

Storage Capacity for Grid ALL VOs Belle VO (Belle & Belle II) ILC VO

Throughputs for external traffic

Throughputs for internal traffic

Reduce Risk of Tape-Data Loss Some data loss incidents occurred due to different reasons : software bugs, media errors, human errors Many users think that tape is cold device, safer device to preserve data for a long period. How to reduce the data-loss risk? Hot media might be safer, since hardware health check seems to work well so far. If a media degradation detected, then the media is copied out to another media soon. Cold media might be potentially in danger. Media could be damaged without notice. Due to media degradation, dust intrusion, etc Important data should be stored redundantly. Write data in multiple media, via parallel writing or re-migration User should pay extra media cost for preventing unexpected data loss. Data integrity in HPSS could help. Checksum, end-to-end integrity Use trash can against human errors?

Data Explosion and Migration Hundreds of PB of data is anticipated in the next several years. System is replaced every ~4 years. HSM system (or tape library) may change by future bidding. If that happens, data migration is a big issue. How to do it efficiently and safely? We expect 6-12 months for migration, but want to minimize the migration period for operational and lease cost reasons. Two data storage systems should be running in parallel during the data migration. Technical issues Checksum should be managed in HPSS for data integrity. Optimization of reading out tape data in the recorded order.

Summary We have been a HPSS user for many years. GHI in operation since 2012. Tape system is important technology for us. In terms of both hardware and software. GHI is a promising solution for HSM for a large scale of data processing. The peformance of HPSS / GHI is good, but the stability of GHI should be improved. Scalable data management is a challenge for next several years. Will upgrade the system two years later. Design and specification of the new system will start soon.