Distributed File Systems Part IV. Hierarchical Mass Storage Systems

Similar documents
Integrating Fibre Channel Storage Devices into the NCAR MSS

Implementing a Digital Video Archive Based on the Sony PetaSite and XenData Software

Table 9. ASCI Data Storage Requirements

File Storage Management Systems (FSMS) and ANSI/AIIM MS66

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures

Top Trends in DBMS & DW

1 PERFORMANCE ANALYSIS OF SUPERCOMPUTING ENVIRONMENTS. Department of Computer Science, University of Illinois at Urbana-Champaign

SPECIFICATION FOR NETWORK ATTACHED STORAGE (NAS) TO BE FILLED BY BIDDER. NAS Controller Should be rack mounted with a form factor of not more than 2U

Andy Kowalski Ian Bird, Bryan Hess

Study of the viability of a Green Storage for the ALICE-T1. Eduardo Murrieta Técnico Académico: ICN - UNAM

Storage on the Lunatic Fringe. Thomas M. Ruwart University of Minnesota Digital Technology Center Intelligent Storage Consortium

The Growth of the UniTree Mass Storage System at the NASA Center for Computational Sciences: Some Lessons Learned

Storage Optimization with Oracle Database 11g

High-Energy Physics Data-Storage Challenges

Proceedings of the Meeting & workshop on Development of a National IT Strategy Focusing on Indigenous Content Development

X1 StorNext SAN. Jim Glidewell Information Technology Services Boeing Shared Services Group

PetaSTAR A Real World Data Storage and Management Solution

IBM řešení pro větší efektivitu ve správě dat - Store more with less

Storage Resource Sharing with CASTOR.

Day 3. Storage Devices + Types of Memory + Measuring Memory + Computer Performance

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

IBM IBM Open Systems Storage Solutions Version 4. Download Full Version :

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

STORAGE TRENDS INTO THE 21 ST CENTURY

XenData MX64 Edition. Product Brief:

VSTOR Vault Mass Storage at its Best Reliable Mass Storage Solutions Easy to Use, Modular, Scalable, and Affordable

Storage Technology Requirements of the NCAR Mass Storage System

MixApart: Decoupled Analytics for Shared Storage Systems. Madalin Mihailescu, Gokul Soundararajan, Cristiana Amza University of Toronto and NetApp

Storage Systems. Storage Systems

XenData Product Brief: SX-550 Series Servers for Sony Optical Disc Archives

IBM V7000 Unified R1.4.2 Asynchronous Replication Performance Reference Guide

IBM TotalStorage Enterprise Tape Controller 3590 Model A60 enhancements support attachment of the new 3592 Model J1A Tape Drive

Benchmarking Tape System Performance

Deduplication Storage System

Secondary Store Ms Tape

Terabytes, Petabytes and Beyond -- Data Storage Strategies

Emerging Technologies for HPC Storage

Exam : Title : Storage Sales V2. Version : Demo

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

Costefficient Storage with Dataprotection

Vendor must indicate at what level its proposed solution will meet the College s requirements as delineated in the referenced sections of the RFP:

Data Management Components for a Research Data Archive

Backing Up and Restoring Multi-Terabyte Data Sets

Overcoming Obstacles to Petabyte Archives

XenData Product Brief: SX-550 Series Servers for LTO Archives

Data Management. Parallel Filesystems. Dr David Henty HPC Training and Support

IBM EXAM QUESTIONS & ANSWERS

IN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures

Scaling a Global File System to the Greatest Possible Extent, Performance, Capacity, and Number of Users

Storage Update and Storage Best Practices for Microsoft Server Applications. Dennis Martin President, Demartek January 2009 Copyright 2009 Demartek

Paradigm Shifts in How Tape is Viewed and Being Used on the Mainframe

HIGH SPEED CONNECTIVITY BETWEEN AN ID-1 TAPE RECORDER AND HIGH PERFORMANCE COMPUTERS THIC MEETING, JANUARY 22-24, DATATAPE Incorporated

The UnAppliance provides Higher Performance, Lower Cost File Serving

Storage Area Network (SAN)

IST346. Data Storage

IBM Storwize V7000 Unified

Using Simulation to Design Scalable and Cost-Efficient Archival Storage Systems

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

BlueGene/L. Computer Science, University of Warwick. Source: IBM

IBM System Storage TS1130 Tape Drive Models E06 and other features enhance performance and capacity

Storage Hierarchy Management for Scientific Computing

<Insert Picture Here> Tape Technologies April 4, 2011

Optimizing Tertiary Storage Organization and Access for Spatio-Temporal Datasets

HPC Growing Pains. IT Lessons Learned from the Biomedical Data Deluge

COSC6376 Cloud Computing Lecture 17: Storage Systems

Storage Hierarchy Management for Scientific Computing

Mass-Storage Systems

Storage. CS 3410 Computer System Organization & Programming

Developing and Sustaining your Enterprise GIS Design

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

DBMS Data Loading: An Analysis on Modern Hardware. Adam Dziedzic, Manos Karpathiotakis*, Ioannis Alagiannis, Raja Appuswamy, Anastasia Ailamaki

Finding a Needle in a Haystack. Facebook s Photo Storage Jack Hartner

Decentralized Distributed Storage System for Big Data

Extremely Fast Distributed Storage for Cloud Service Providers

Data Movement & Tiering with DMF 7

NetVault Backup Client and Server Sizing Guide 2.1

Technology Insight Series

Development of Secondary Archive System at Goddard Space Flight Center Version 0 Distributed Active Archive Center

The Future of Data Archive

Computer Organization and Structure. Bing-Yu Chen National Taiwan University

NetVault Backup Client and Server Sizing Guide 3.0

Introduction To Computer Hardware. Hafijur Rahman

CS370 Operating Systems

Integrated hardware-software solution developed on ARM architecture. CS3 Conference Krakow, January 30th 2018

Power of the Portfolio. Copyright 2012 EMC Corporation. All rights reserved.

Vendor: IBM. Exam Code: Exam Name: IBM Midrange Storage Technical Support V3. Version: Demo

HPSS RAIT. A high performance, resilient, fault-tolerant tape data storage class. 1

Future File System: An Evaluation

CMSC 424 Database design Lecture 12 Storage. Mihai Pop

I/O Acceleration by Host Side Resources

Mass Storage at the PSC

Coordinating Parallel HSM in Object-based Cluster Filesystems

o MAC (6/18 or 33%) o WIN (17/18 or 94%) o Unix (0/18 or 0%)

<Insert Picture Here> Oracle Storage

1.1 Bits and Bit Patterns. Boolean Operations. Figure 2.1 CPU and main memory connected via a bus. CS11102 Introduction to Computer Science

FICON Extended Distance Solution (FEDS)

HPC File Systems and Storage. Irena Johnson University of Notre Dame Center for Research Computing

Data Protection for Cisco HyperFlex with Veeam Availability Suite. Solution Overview Cisco Public

Chapter 4:- Introduction to Grid and its Evolution. Prepared By:- NITIN PANDYA Assistant Professor SVBIT.

The ESA Earth Observation Payload Data Long Term Storage Activities ABSTRACT

Transcription:

Distributed File Systems Part IV Daniel A. Menascé Hierarchical Mass Storage Systems On-line data requirements Mass Storage Systems Concepts Mass storage system architectures Example systems Performance of Mass Storage Systems 1

On-line data requirements Thomas Jefferson National Accelerator Facility: will collect over one Terabyte of raw accelerator data per day when post-processing is included, 500 TB of raw and formatted data will be generated per year. a total of one Petabyte (1,000 TB) stored by the year 2,000. FY CPU (MIPS) Disk (GB) Near-line Tape (TB) 96 2K 100 5 97 10K 500 150 98 20K 1000 300 99 30K 2000 1200 On-line data requirements DKRZ, Hamburg, Germany, climate research support and complex climate simulations 97 98 99 2000 CPU performance (GFlops) 20 20 35 40 data generation rate (GB/day) 150 150 300 400 required data archival capacity (TB) 80 120 200 300 required peak transfer rate (MBytes/sec) 60 60 120 160 2

On-line data requirements Goddard Space Flight Center Distributed Active Archive Center, Greenbelt, MD supports Earth Observing (EOS) science datasets about 600 TB ordered per month in 1995 peak of 250 TB ordered per hour in 1995. users tend to reference files just created or files created a long time ago (over 3 months) close to 4,000 tape mounts per week about 50 files transferred per tape mount On-line data requirements NASA s Center for Computational Sciences, Greenbelt, MD supports space and Earth researchers 6 StorageTek silos with 28.8 TB and one IBM 3494 robotic tape library with an additional 24 TB. about 1 TB retrieved per week about 700 TB of robotic storage will be needed by the year 2000. 3

NASA s Center for Computational Science Total Terabytes Stored NASA s Center for Computational Science Workload Intensity 4

Hierarchical Mass Storage Systems access time RAM Magnetic Disks Robotically Mounted Tapes Off-line Tapes cost/ Mbyte Hierarchical Mass Storage Systems How to obtain disk access times at a cost per MByte comparable with magnetic tapes? disk caches automatic file migration between the disk cache and the tape subsystem. 5

Mass Storage Systems: Disk Cache F6 F4 F5 F1 F2 F3 robot disk cache tape drive F1,..., F100 Mass Storage Systems: Cache Miss F6 F4 F1 F2 F3 disk cache F9 tape drive robot F1,..., F100 6

Mass Storage Systems: Migration between levels F6 F4 F9 F1 F2 F3 F9 robot disk cache tape drive - files unused for a long time are automatically migrated to tape. Host Attached Mass Storage System Architectures file server all peripherals are attached to the file server all data transfers between disk and tapes have to use the file server s (host) main memory. Network Attached peripherals are connected directly to the network data transfers between disk and tape do not use the file server s main memory. 7

Host-attached MSSs Client... Client File Server Disk Cache Tape MSS Host File Server Disks Disk Cache Disks Robotic Tape Server Host Attached Device Based Mass Storage System Cray - Convex/UniTree Mass Storage System 330 gigabytes disk (formatted) 8 StorageTek 3490 freestanding cartridge drives StorageTek ACS 1 9310 Powderhorn silo 8 cartridge drives (3490) Cray C98 6 CPUs, 1 gigaflop per processor 256 megawords central memory 512 megawords SSD 8

Network-attached MSSs Storage Access Control Network Client Client High Speed Data Network (e.g. HIPPI) HA Disk File Server NA Disk Server NA Tape Server Storage Unit Control Network Transfer Protocols: Device to Device Transfer 9

Network-attached MSSs Storage Access Control Network 1 Client Client High Speed Data Network (e.g. HIPPI) 6* 5* HA Disk File Server NA Disk Server NA Tape Server Storage Unit Control Network 2,3,7 2,4,7 Network-attached MSSs Features Separation of control and data paths Scalability: host memory is not the bottleneck any longer. 10

Robotic Tape Library robot cartridge tape drives tape cartridges tape cartridge to be mounted robot track Examples of Devices for MSSs StorageTek: Powderhorn (robotic tape library): 6,000 cartridge capacity 1-4 tape cartridge drives 2-16 robotic arms up to 350 tape exchanges/hour separation of control and data paths 11

Examples of Devices for MSSs Sony: DMS-B1000 (robotic tape library): 1,104 DTF tapes (12 GB per tape) up to 4 tape drives maximum data capacity of 13.2 TB access time < 6 sec separation of control and data paths tape drive: 300 MB/sec search speed 12 MB/sec transfer rate 40 sec rewind time File Systems for MSSs AMASS (EMASS) EMASS UNIX file system interface direct access to automated tape libraries Unitree (UniTree Software Inc.) based on the IEEE Mass Storage Reference Model NFS and ftp interface client/server architecture multiple robot/media support 12

Mass Storage System Example: Unitree Central File Manager Robotically mounted tape system (24,000 tapes). Off-line tape library. Magnetic disk file cache (155 GBytes). Automatic migration between levels. Compliance with the IEEE Mass Storage System Reference Model. Unitree I/O Architecture IDC TLI Tape Silo Convex C3830 IDC TLI IDC TLI IDC Tape Silo Tape Silo TLI Tape Silo 13

Convex Unitree Diagram Control Unit Sun WS Tape Silo Workload Characterization k-means clustering was performed on the file sizes of the requests. Larger k gives better fit/more classes in the model. A tightness measure was used. d j = t = 1 s j 1 k d d pi s j k j= 1 j ij Class File Size in MB Frequency of Occurrence Get-1 1.2 33.8% Get-2 19.6 9.9% Get-3 78.9 4.2% Get-4 220.6 1.4% Put-1 1.7 42.3% Put-2 34.8 3.3% Put-3 77.7 3.9% Put-4 144.1 1.2% 14

Host-attached MSSs Client... Client File Server Disk Cache Tape MSS Host File Server Disks Disk Cache Disks Robotic Tape Server Queuing Network Model 15

Workload Intensity Increase Results Client and Server Compression Client compression Server compression 16

File Abstraction Network-attached MSSs Storage Access Control Network Client Client High Speed Data Network (e.g. HIPPI) HA Disk File Server NA Disk Server NA Tape Server Storage Unit Control Network 17

Queuing Network Model HA Based vs. NA Based MSSs 18

Trends in Distributed File Systems New Hardware: cheap main memory file system in main memory with backups in videotape or optical disks. extremely fast fiber optic networks avoid client caching. Scalability: from 100 to 1,000 to 10,000 nodes! use of broadcast messages should be reduced. resources and algorithms should not be linear in the number of users. Trends in Distributed File Systems Wide Area Networking: Present: Backbone at 45 Mbps and access bandwidth at 19.2 Kbps. Future: Backbone at a few Gbps and access bandwidth at 56Kbps or higher with Cable-modems. Mobile Users: increase in disconnected operation mode. files will be cached for longer periods (hours or days) at the client laptop. 19

Trends in Distributed File Systems Fault Tolerance: down times are not well accepted by people in general. As distributed systems become more widespread, provisions for higher availability have to be incorporated into the design. Future: Backbone at a few Gbps and access bandwidth at 56Kbps. Multimedia: new applications such as video-on-demand, audio files pose different demands on the design of a file system. 20