Storage Virtualization. Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization Eric Yen Academia Sinica Grid Computing Centre (ASGC) Taiwan

Storage Virtualization In computer science, storage virtualization uses virtualization to enable better functionality and more advanced features in computer data storage systems. Within the context of a storage system, there are two primary types of virtualization that can occur: Block virtualization used in this context refers to the abstraction (separation) of logical storage (partition) from physical storage so that it may be accessed without regard to physical storage or heterogeneous structure. This separation allows the administrators of the storage system greater flexibility in how they manage storage for end users.[1] File virtualization addresses the NAS challenges by eliminating the dependencies between the data accessed at the file level and the location where the files are physically stored. This provides opportunities to optimize storage use and server consolidation and to perform non-disruptive file migrations.

Managing the research data lifecycle Light Source 1 PI initiates transfer request; or requested automatically by script or science gateway Globus transfers files reliably, securely 2 Transfer SaaS! Only a web browser required Access using your campus credentials Globus monitors and informs throughout Compute Facility 3 Share PI selects files to share, selects user or group, and sets access permissions Researcher logs in and accesses shared files; no local account required; download via Globus 5 Personal Computer 4 Globus controls access to shared files on existing storage; no need to move files to cloud storage! 6 Researcher assembles data set; describes it using metadata (Dublin core and domain-specific) Publish 6 Curator reviews and 7 approves; data set published on campus or other system Peers, collaborators search and discover datasets; transfer and share using Globus Publication Repository Discover 8 Source: Ian Foster, Globus 8

General Analysis Workflow of LHC Computing Simulation of soft physics physics process Simulation of ATLAS detector LHC data Simulation of high-energy physics process prob(data SM) P(m 4l SM[m H ]) Observed m 4l Analysis Event selection Reconstruction of ATLAS detector Source: Wouter Verkerke, NIKHEF

Whole Genome Analysis pipeline (GATK) Physics and genomics both have Big Data But in almost every respect, they differ greatly from each other Genomics data: More complex and variable, used in more demanding ways Growth is accelerating faster than physics data Greater uncertainty on short timescales => less time to respond Less community-wide investment in s/w and infrastructure 5

The Needs Integrated global services: Resources are aggregated and federated by cloud and distributed computing infrastructure Data management platform and services for user communities across institute boundaries Scalable infrastructure and analysis services: use any mountable storage system as local data store Using standard POSIX access via global mount point Scientific Gateway Requirements: for both big data and long-tail sciences Data sharing, access, transmission, publication, discovery, archive, etc. Combined with the infrastructure and middleware, reducing the cost to create and maintain an ecosystem of integrated research applications Simplify Research Data Management: reuse and reproducibility Agility in adding/removing storage Geo-Redirection Load balancing/ failover Bridging to Public Clouds: e.g, Amazon S3 endpoints, OpenStack Ceph object storage Performance improvements Cost of storage is much higher than the cost of compute Better storage resource efficiency Easier deployed and shared by/with APAN communities

Implementation Using Cloud Services Simplicity, Reliability, Economies of scale, Scalability, Flexibility, etc. Convergence of Protocols HTTP/DAV, APIs for Data Services e.g, Transfer, Authentication & Authorization, Group membership, etc.

Dynamic Federation 8 Source: DynaFed Project

Virtualizing xrootd Storage Extending Meta Data Network Example of an async r/w container Central Container Catalog Persistent Storage P0 async operation on central catalog Persistent Storage P1 cv P1 Volatile Storage V2 Data and Meta Data can be cached in this container in rack V2 - the meta data update is asynchronous (lazy) - meta data and data writes are locally MD is asynchronously migrated into the central catalog data is migrated asynchronously based by replica policies into persistent storages P1 & P2 - One WAN operation to get/restore container MD - LAN namespace operations for local mounts

Logical View of OSiRIS 4/18/2016 HEPiX Spring 2016 12

Ceph in OSiRIS Ceph gives us a robust open source platform to host our multi-institutional science data Self-healing and self-managing Multiple data interfaces Rapid development supported by RedHat Able to tune components to best meet specific needs Software defined storage gives us more options for data lifecycle management automation Sophisticated allocation mapping (CRUSH) to isolate, customize, optimize by science use case Ceph overview: https://umich.app.box.com/s/f8ftr82smlbuf5x8r256hay7660soafk Oct 19, 2016 OSiRIS HEPiX Fall 2016 -

Ceph as a Unified Storage Backend Ceph provides unified storage and supports software-defined storage Ceph is a scale-out unified storage platform Ceph is cost-effective Ceph is OpenSource project just like OpenStack 14

We intend to preserve dual cluster layout in the future ~ 5.5 PB of total RAW capacity The old cluster gets the new set of head nodes and both clusters are getting relocated to another area of RACF data center Originally, these installations were only supporting CephFS and RadosGW/S3 clients, but other gateway systems such as GridFTP/CephFS, FTS3/CephFS, OpenStack Swift/Ceph and (experimental) dcache/ceph gateways were added shortly after. * Up to 8.7 GB/s of aggregated throughput with CephFS (client network uplink limited), (Finally, the iscsi is going to be eliminated from the storage backend) * Up to 1.7 GB/s of throughput via OpenStack Swift gateways (client network uplink limited), * Up to 1 GB/s of I/O capability demonstrated with RadosGW/S3 gateways subsystem with ANL to BNL object store tests (up to 24k simultaneous client connections permitted).

Enhanced 'Tier-3' services Tier3 NFS VM VM VM VM VM VM Ceph client Ceph client Tier3 NFS Ceph client Cloud NFS Ceph client Ceph Ceph client Tier3 NFS

Tier-2 integration (StoRM / xrootd) ATLAS DDM integration StoRM (CephFS as data backend; Allows to set group ACLs on the fly) A secondary ATLAS LOCALGROUP DISK served by cephfs (using the kernel client) Read access for local atlas users ATLAS FAX integration Xrootd server (using its posix interface) ATLAS DDM SE ATLAS FAX Rucio Web or CLI StoRM / xrood ceph client Xrootd CLI ceph client POSIX read access Ceph acls

Challenges Keep Improving System Efficiency by Application Experiences and Make It Intelligent Scale Matters Just locating a resource in a huge index can be a challenge for scalability and speed Must Be Designed Based on Computing Model (& practices of real applications) Ensure the Data Consistency Storage Complexity Hardware evolution Storage hierarchy, especially when considering hot and cold storage Data Protection and Privacy Data protection laws that hinder the use of cross-boundary services: e.g, EU data protection directive Higher data protection compliance: sensitive unclassified, classified Sustainability: Maintaining the production-grade services cost-effectively