Overcoming Obstacles to Petabyte Archives

Similar documents
OPENARCHIVE The Final Destination of Your Data. Quick Start Guide

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

TS7700 Technical Update What s that I hear about R3.2?

Integrating Fibre Channel Storage Devices into the NCAR MSS

Chapter 4 Data Movement Process

TS7700 Technical Update TS7720 Tape Attach Deep Dive

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

Storage Optimization with Oracle Database 11g

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

GFS: The Google File System

Universal Storage Consistency of DASD and Virtual Tape

Exam : S Title : Snia Storage Network Management/Administration. Version : Demo

NetVault Backup Client and Server Sizing Guide 3.0

Mainframe Virtual Tape: Improve Operational Efficiencies and Mitigate Risk in the Data Center

IBM Spectrum Protect Version Introduction to Data Protection Solutions IBM

Distributed File Systems II

EMC Solutions for Backup to Disk EMC Celerra LAN Backup to Disk with IBM Tivoli Storage Manager Best Practices Planning

Chapter 2 CommVault Data Management Concepts

Implementing a Hierarchical Storage Management system in a large-scale Lustre and HPSS environment

Lustre overview and roadmap to Exascale computing

Current Topics in OS Research. So, what s hot?

The Future of Data Archive

1Z0-433

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

Realizing the Promise of SANs

NetVault Backup Client and Server Sizing Guide 2.1

Distributed Systems 16. Distributed File Systems II

T e c h n o l o g y. LaserTAPE: The Future of Storage

Veritas NetBackup on Cisco UCS S3260 Storage Server

Chapter 11: File System Implementation. Objectives

CA485 Ray Walshe Google File System

The advantages of architecting an open iscsi SAN

Providing a first class, enterprise-level, backup and archive service for Oxford University

EMC Data Domain for Archiving Are You Kidding?

Data Movement & Tiering with DMF 7

CSE 124: Networked Services Fall 2009 Lecture-19

IBM High-End Disk Solutions Version 5.

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

designed. engineered. results. Parallel DMF

VERITAS Volume Replicator Successful Replication and Disaster Recovery

Chapter 12: File System Implementation

Solution Brief: Archiving Avid Interplay Projects using NLT and XenData

VERITAS Storage Foundation 4.0 TM for Databases

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Chapter 11. SnapProtect Technology

The Google File System. Alexandru Costan

<Insert Picture Here> Filesystem Features and Performance


Storage Resource Sharing with CASTOR.

Configuring IBM Spectrum Protect for IBM Spectrum Scale Active File Management

XenData Product Brief: XenData6 Server Software

Configuring Short RPO with Actifio StreamSnap and Dedup-Async Replication

Implementing a Digital Video Archive Based on the Sony PetaSite and XenData Software

Contingency Planning and Disaster Recovery

Storage Management Solutions. The Case For Efficient Data Management In Broadcast

Lesson 9 Transcript: Backup and Recovery

XenData MX64 Edition. Product Brief:

IBM Virtualization Engine TS7720 and TS7740 Releases 1.6, 1.7, 2.0, 2.1 and 2.1 PGA2 Performance White Paper Version 2.0

Pushing the Limits. ADSM Symposium Sheelagh Treweek September 1999 Oxford University Computing Services 1

An Introduction to GPFS

Content Addressed Storage (CAS)

CISC 7310X. C11: Mass Storage. Hui Chen Department of Computer & Information Science CUNY Brooklyn College. 4/19/2018 CUNY Brooklyn College

5 Fundamental Strategies for Building a Data-centered Data Center

White paper Version 3.10

Enterprise Vault Whitepaper Enterprise Vault Integration with Veritas Products

IBM Storwize V7000 Unified

Veritas Exam VCS-371 Administration of Veritas NetBackup 7.5 for Windows Version: 7.2 [ Total Questions: 207 ]

Backup Solutions with (DSS) July 2009

Insights into TSM/HSM for UNIX and Windows

UK LUG 10 th July Lustre at Exascale. Eric Barton. CTO Whamcloud, Inc Whamcloud, Inc.

Backup and archiving need not to create headaches new pain relievers are around

Tivoli Storage Manager V3.7 Performance Update Joseph Giulianelli IBM San Jose

The Google File System

Table 9. ASCI Data Storage Requirements

B. create a Storage Lifecycle Policy that contains a duplication to the MSDP pool

File System: Interface and Implmentation


HP Supporting the HP ProLiant Storage Server Product Family.

Veeam Cloud Connect. Version 8.0. Administrator Guide

IBM i Version 7.3. Systems management Disk management IBM

IT Infrastructure: Poised for Change

Shared Parallel Filesystems in Heterogeneous Linux Multi-Cluster Environments

IBM TotalStorage Enterprise Storage Server Model 800

Storing Data: Disks and Files

IBM System Storage DS5020 Express

Archive Appliance. Published Product Release Notes

GFS: The Google File System. Dr. Yingwu Zhu

A Thorough Introduction to 64-Bit Aggregates

Exam Name: IBM Tivoli Storage Manager V6.2

6,000 Cameras in Time Square 210 million Cameras worldwide

CSE 124: Networked Services Lecture-16

Today: Coda, xfs! Brief overview of other file systems. Distributed File System Requirements!

Yves Goeleven. Solution Architect - Particular Software. Shipping software since Azure MVP since Co-founder & board member AZUG

VERITAS Volume Replicator. Successful Replication and Disaster Recovery

SAP Applications on IBM XIV System Storage

A Thorough Introduction to 64-Bit Aggregates

The Right Choice for DR: Data Guard, Stretch Clusters, or Remote Mirroring. Ashish Ray Group Product Manager Oracle Corporation

C13: Files and Directories: System s Perspective

Performance Monitoring User s Manual

Coordinating Parallel HSM in Object-based Cluster Filesystems

Transcription:

Overcoming Obstacles to Petabyte Archives Mike Holland Grau Data Storage, Inc. 609 S. Taylor Ave., Unit E, Louisville CO 80027-3091 Phone: +1-303-664-0060 FAX: +1-303-664-1680 E-mail: Mike@GrauData.com Presented at the THIC Meeting at the National Center for Atmospheric Research Boulder CO 80305-5602 June 11-12, 2002 Slide 1

A Petabyte of Archive is Easy? 30 cu ft Our system utilizes SONY AIT technology, so 1 PB is only 10,000 tapes or about enough tapes to fill a box 1M on a side In the future, we can comment about the size of the cube Slide 2

Components of an Archive Application Server Network User Community Content Catalog Storage Mgmt. Software RAID Library Drives Media Slide 3

The Challenges of a PB Archive have less to do with sheer Storage Support for Users Common Access to all of the Data Recall Characteristics Data Acquisition Rates Data Expiration Rates and Reorganization of Media Catalog Size (File System Entries) System Conversions Absorption of Older Archives Transcription to Newer Technologies Data Protection Schemes (Media Availability) System Availability (Fault Resilience) Recovery Slide 4

I/O Activity within Archive Number of Recall Users Concurrency Peak workloads Data Acquisition Rates Data Request Queue Time Expectations Data Access Patterns Locality of reference for pre-stage data caching Scheduled recalls staged for faster use Actually, less concerned about I/O than management within archive Slide 5

What Does a PetaByte Imply? 10 Years to Fill Archive 5 2 1 0 5 10 15 20 25 Sustained Data Rate MB/Sec 30 35 (24 hrs/day * 365 days/yr) Slide 6

What Does a PetaByte Imply? 100000 HSM File System Entries (M) 10000 1000 100 10 Current Industry Design Goal 1 50KB 1MB 100MB 1GB Average File Sizes Slide 7

PB Archive Must Be Scalable Throughout Resource Manager File System File System File System Meta Data Server Archive Archive # of files drives absolute requirement for multiple filesystems All agents are connected to the RM and MDS Front End s ( File System ) are connected to Back End s ( Archive ) through IP connections. Requirement for a stable archiving OS

So, It s Agreed! We Never Delete Any Files! Present a common view of data Add older systems to a current system Continually migrate data to newer storage media Network IVD - FS Metadata Foreign File System Capture old catalog into new management system Migrate old data in background Slide 9

High Availability Resource Manager Meta Data Server File System Archive Resource Manager Meta Data Server Multiple instances of Resource Manager and Meta Data Server may exist within in one system Two or more copies of control information are journaled onto separate disk subsystems

Data Protection Schemes To guarantee data security, IVD can store several copies of the data to different media pools, local or remote Pools may be located far away from each other but connected through TCP/IP File System Archive Archive

IVD - Policies Each FE can manage several partitions A partition contains a file system and one or more tape pools Each partition has a wide set of migration and truncation policies: Partition, directory or specific file level File access time and file size based Wildcard or file extension based Exclude lists Priority based job execution (migration, recall, reorg) Extended via external filter script for user defined policies Associative file grouping for migrate and recall as a group Slide 12

Policy Templates Preconfigured policy sets for the most common use Premigration: Data are copied to the tape soon after they are stored to secondary media but stay on disk. If disk space is needed, older data are removed. Immediate Migration: Data are copied to secondary media soon after they are stored to IVD and removed from disk immediately. Triggered Migration: Certain filesystem events control data migration to secondary media. User is free to define policies of any style, using individual scripts Slide 13

The Commercial Challenge How do we fill up a 35TB archive, much less a Petabyte? Changes in the industry regarding management of data More sophisticated data copy and migration More attention to TCO of storage Re-purposing data and data mining Regulated communities are seeing larger, online requirements including disaster recovery Data intensive video community ready to move beyond sneaker-net Closer relationship between backup and archiving Slide 14

Archiving at it s Best: Today Windows2000 NT Backup Server UNIX MAC NAS Dual copy Asynchronous Replication of each partition in same unit IVD1 Server IVD2 Server Remote copy Asynchronous replication to second unit via the network via dedicated network connection On campus, across town, across the country IVD1 master IVD2 copy DFS Server IVD2 master IVD1 copy High Availability DFS system Load balancing Disaster Recovery High Availability Single system view Slide 15

Archiving later this Year Multiple Servers within an IVD Complex plus multiple Data Movers to ingest data for the Archive Client-1 Client-2 Client-3 IVD-Client ML-1 IVD-Client ML-1 IVD-Client ML-1 LAN IVD-box Server-1 RAID disk CfgDB, RMDB, FSC, Disk Buffer (optional) Disk Buffer (optional) IVD-Server IVD- (FE/BE) CfgD B, R M DB, FSC IVD- (FE/BE) Server-2 RAID disk Disk Buffer (optional) TLU IVD-LAN Slide 16

Grau System Capacities IVD-XL Capacity (PB) 5 4.5 4 3.5 3 2.5 2 1.5 1 0.5 0 AIT-3 at 100GB/Cart S-AIT at 500GB/Cart LaserTAPE at 1000GB/Cart 576 MB/S with AIT-3 720 MB/s with S-AIT 960 MB/s with LaserTAPE AIT-3 S-AIT LOTS Slide 17

IVD-XL 1PB of Archive Capacity Slide 18