Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Similar documents
Current Status of the Ceph Based Storage Systems at the RACF

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

Globus Platform Services for Data Publication. Greg Nawrocki University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018

RED HAT CEPH STORAGE ROADMAP. Cesar Pinto Account Manager, Red Hat Norway

Welcome! Presenters: STFC January 10, 2019

Data publication and discovery with Globus

Ceph Software Defined Storage Appliance

The Materials Data Facility

Part2: Let s pick one cloud IaaS middleware: OpenStack. Sergio Maffioletti

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

Windows Azure Services - At Different Levels

INTRODUCTION TO CEPH. Orit Wasserman Red Hat August Penguin 2017

Managing Protected and Controlled Data with Globus. Vas Vasiliadis

Visita delegazione ditte italiane

SUSE Enterprise Storage 3

Storage for HPC, HPDA and Machine Learning (ML)

Cloud object storage in Ceph. Orit Wasserman Fosdem 2017

Ceph Rados Gateway. Orit Wasserman Fosdem 2016

Why software defined storage matters? Sergey Goncharov Solution Architect, Red Hat

Data Movement & Tiering with DMF 7

THE CEPH POWER SHOW. Episode 2 : The Jewel Story. Daniel Messer Technical Marketing Red Hat Storage. Karan Singh Sr. Storage Architect Red Hat Storage

<Insert Picture Here> Enterprise Data Management using Grid Technology

Archive Solutions at the Center for High Performance Computing by Sam Liston (University of Utah)

GlusterFS Architecture & Roadmap

Scheduling Computational and Storage Resources on the NRP

I D C T E C H N O L O G Y S P O T L I G H T. V i r t u a l and Cloud D a t a Center Management

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Lustre overview and roadmap to Exascale computing

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

Effizientes Speichern von Cold-Data

EBOOK DATABASE CONSIDERATIONS FOR DOCKER

Genomics on Cisco Metacloud + SwiftStack

Next Generation Storage for The Software-Defned World

Scientific data processing at global scale The LHC Computing Grid. fabio hernandez

Simplifying Collaboration in the Cloud

Copyright 2012 EMC Corporation. All rights reserved.

Software Defined Storage for the Evolving Data Center

Introducing SUSE Enterprise Storage 5

Provisioning with SUSE Enterprise Storage. Nyers Gábor Trainer &

Online data storage service strategy for the CERN computer Centre G. Cancio, D. Duellmann, M. Lamanna, A. Pace CERN, Geneva, Switzerland

Conference The Data Challenges of the LHC. Reda Tafirout, TRIUMF

The File Systems Evolution. Christian Bandulet, Sun Microsystems

Federated data storage system prototype for LHC experiments and data intensive science

Ceph. The link between file systems and octopuses. Udo Seidel. Linuxtag 2012

Discover CephFS TECHNICAL REPORT SPONSORED BY. image vlastas, 123RF.com

SUSE Enterprise Storage Technical Overview

Exploring cloud storage for scien3fic research

VMs at a Tier-1 site. EGEE 09, Sander Klous, Nikhef

A Gentle Introduction to Ceph

Big Data infrastructure and tools in libraries

FLASHARRAY//M Business and IT Transformation in 3U

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

IBM ProtecTIER and Netbackup OpenStorage (OST)

Experiences with the new ATLAS Distributed Data Management System

Software Defined Storage

SUSE Enterprise Storage Case Study Town of Orchard Park New York

Data services for LHC computing

Verron Martina vspecialist. Copyright 2012 EMC Corporation. All rights reserved.

StrongLink: Data and Storage Management Simplified

DELL POWERVAULT NX3500. A Dell Technical Guide Version 1.0

The Latest EMC s announcements

Red Hat Roadmap for Containers and DevOps

Promoting Open Standards for Digital Repository. case study examples and challenges

IBM Storwize V7000 Unified

UK Tier-2 site evolution for ATLAS. Alastair Dewhurst

ROCK INK PAPER COMPUTER

WHITE PAPER Software-Defined Storage IzumoFS with Cisco UCS and Cisco UCS Director Solutions

-Presented By : Rajeshwari Chatterjee Professor-Andrey Shevel Course: Computing Clusters Grid and Clouds ITMO University, St.

Workspace & Storage Infrastructure for Service Providers

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Microsoft Office SharePoint Server 2007

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

Ceph Block Devices: A Deep Dive. Josh Durgin RBD Lead June 24, 2015

Design patterns for data-driven research acceleration

Choosing an Interface

Dynamic Federations. Seamless aggregation of standard-protocol-based storage endpoints

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

Warsaw. 11 th September 2018

Ceph at the DRI. Peter Tiernan Systems and Storage Engineer Digital Repository of Ireland TCHPC

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

XtreemFS a case for object-based storage in Grid data management. Jan Stender, Zuse Institute Berlin

NetApp Clustered ONTAP & Symantec Granite Self Service Lab Timothy Isaacs, NetApp Jon Sanchez & Jason Puig, Symantec

BeoLink.org. Design and build an inexpensive DFS. Fabrizio Manfredi Furuholmen. FrOSCon August 2008

PoS(EGICF12-EMITC2)106

N. Marusov, I. Semenov

A Robust, Flexible Platform for Expanding Your Storage without Limits

70-414: Implementing an Advanced Server Infrastructure Course 01 - Creating the Virtualization Infrastructure

irods at TACC: Secure Infrastructure for Open Science Chris Jordan

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

Grid Computing. MCSN - N. Tonellotto - Distributed Enabling Platforms

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

A Simple Mass Storage System for the SRB Data Grid

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

BlackPearl Customer Created Clients Using Free & Open Source Tools

The Cambridge Bio-Medical-Cloud An OpenStack platform for medical analytics and biomedical research

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

An Introduction to GPFS

Users and utilization of CERIT-SC infrastructure

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

Secure, scalable storage made simple. OEM Storage Portfolio

Transcription:

Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality and more advanced features in computer data storage systems. Within the context of a storage system, there are two primary types of virtualization that can occur: Block virtualization used in this context refers to the abstraction (separation) of logical storage (partition) from physical storage so that it may be accessed without regard to physical storage or heterogeneous structure. This separation allows the administrators of the storage system greater flexibility in how they manage storage for end users.[1] File virtualization addresses the NAS challenges by eliminating the dependencies between the data accessed at the file level and the location where the files are physically stored. This provides opportunities to optimize storage use and server consolidation and to perform non-disruptive file migrations.

Managing the research data lifecycle Light Source 1 PI initiates transfer request; or requested automatically by script or science gateway Globus transfers files reliably, securely 2 Transfer SaaS! Only a web browser required Access using your campus credentials Globus monitors and informs throughout Compute Facility 3 Share PI selects files to share, selects user or group, and sets access permissions Researcher logs in and accesses shared files; no local account required; download via Globus 5 Personal Computer 4 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 6 Researcher assembles data set; describes it using metadata (Dublin core and domain-specific) Publish 6 Curator reviews and 7 approves; data set published on campus or other system Peers, collaborators search and discover datasets; transfer and share using Globus Publication Repository Discover 8 Source: Ian Foster, Globus 8

General Analysis Workflow of LHC Computing Simulation of soft physics physics process Simulation of ATLAS detector LHC data Simulation of high-energy physics process prob(data SM) P(m 4l SM[m H ]) Observed m 4l Analysis Event selection Reconstruction of ATLAS detector Source: Wouter Verkerke, NIKHEF

Whole Genome Analysis pipeline (GATK) Physics and genomics both have Big Data But in almost every respect, they differ greatly from each other Genomics data: More complex and variable, used in more demanding ways Growth is accelerating faster than physics data Greater uncertainty on short timescales => less time to respond Less community-wide investment in s/w and infrastructure 5

The Needs Integrated global services: Resources are aggregated and federated by cloud and distributed computing infrastructure Data management platform and services for user communities across institute boundaries Scalable infrastructure and analysis services: use any mountable storage system as local data store Using standard POSIX access via global mount point Scientific Gateway Requirements: for both big data and long-tail sciences Data sharing, access, transmission, publication, discovery, archive, etc. Combined with the infrastructure and middleware, reducing the cost to create and maintain an ecosystem of integrated research applications Simplify Research Data Management: reuse and reproducibility Agility in adding/removing storage Geo-Redirection Load balancing/ failover Bridging to Public Clouds: e.g, Amazon S3 endpoints, OpenStack Ceph object storage Performance improvements Cost of storage is much higher than the cost of compute Better storage resource efficiency Easier deployed and shared by/with APAN communities

Implementation Using Cloud Services Simplicity, Reliability, Economies of scale, Scalability, Flexibility, etc. Convergence of Protocols HTTP/DAV, APIs for Data Services e.g, Transfer, Authentication & Authorization, Group membership, etc.

Dynamic Federation 8 Source: DynaFed Project

9

Virtualizing xrootd Storage Extending Meta Data Network Example of an async r/w container Central Container Catalog Persistent Storage P0 async operation on central catalog Persistent Storage P1 cv P1 Volatile Storage V2 Data and Meta Data can be cached in this container in rack V2 - the meta data update is asynchronous (lazy) - meta data and data writes are locally MD is asynchronously migrated into the central catalog data is migrated asynchronously based by replica policies into persistent storages P1 & P2 - One WAN operation to get/restore container MD - LAN namespace operations for local mounts

Logical View of OSiRIS 4/18/2016 HEPiX Spring 2016 12

Ceph in OSiRIS Ceph gives us a robust open source platform to host our multi-institutional science data Self-healing and self-managing Multiple data interfaces Rapid development supported by RedHat Able to tune components to best meet specific needs Software defined storage gives us more options for data lifecycle management automation Sophisticated allocation mapping (CRUSH) to isolate, customize, optimize by science use case Ceph overview: https://umich.app.box.com/s/f8ftr82smlbuf5x8r256hay7660soafk Oct 19, 2016 OSiRIS HEPiX Fall 2016 -

Ceph as a Unified Storage Backend Ceph provides unified storage and supports software-defined storage Ceph is a scale-out unified storage platform Ceph is cost-effective Ceph is OpenSource project just like OpenStack 14

15

We intend to preserve dual cluster layout in the future ~ 5.5 PB of total RAW capacity The old cluster gets the new set of head nodes and both clusters are getting relocated to another area of RACF data center Originally, these installations were only supporting CephFS and RadosGW/S3 clients, but other gateway systems such as GridFTP/CephFS, FTS3/CephFS, OpenStack Swift/Ceph and (experimental) dcache/ceph gateways were added shortly after. * Up to 8.7 GB/s of aggregated throughput with CephFS (client network uplink limited), (Finally, the iscsi is going to be eliminated from the storage backend) * Up to 1.7 GB/s of throughput via OpenStack Swift gateways (client network uplink limited), * Up to 1 GB/s of I/O capability demonstrated with RadosGW/S3 gateways subsystem with ANL to BNL object store tests (up to 24k simultaneous client connections permitted).

Enhanced 'Tier-3' services Tier3 NFS VM VM VM VM VM VM Ceph client Ceph client Tier3 NFS Ceph client Cloud NFS Ceph client Ceph Ceph client Tier3 NFS

Tier-2 integration (StoRM / xrootd) ATLAS DDM integration StoRM (CephFS as data backend; Allows to set group ACLs on the fly) A secondary ATLAS LOCALGROUP DISK served by cephfs (using the kernel client) Read access for local atlas users ATLAS FAX integration Xrootd server (using its posix interface) ATLAS DDM SE ATLAS FAX Rucio Web or CLI StoRM / xrood ceph client Xrootd CLI ceph client POSIX read access Ceph acls

Challenges Keep Improving System Efficiency by Application Experiences and Make It Intelligent Scale Matters Just locating a resource in a huge index can be a challenge for scalability and speed Must Be Designed Based on Computing Model (& practices of real applications) Ensure the Data Consistency Storage Complexity Hardware evolution Storage hierarchy, especially when considering hot and cold storage Data Protection and Privacy Data protection laws that hinder the use of cross-boundary services: e.g, EU data protection directive Higher data protection compliance: sensitive unclassified, classified Sustainability: Maintaining the production-grade services cost-effectively