Nils Haustein Executive IT Specialist EMEA Storage Competence Center Introduction to Digital Archiving and IBM archive storage options
Agenda Introduction to Archiving Archiving Techniques IBM Archive Storage Options 2
Archiving why, what and how? Archiving is a well-planned process where data which usually does not change anymore is moved into an archiving system. Data access, search, processing and compliance is guaranteed over long life cycles. Archiving requires strategic thinking! 3
Reasons, Requirements, Challenges Challenges Long Lifecycle Technological Progress Reasons Data Growth Regulatory Duty Preservation of Information Requirements Cost and Efficiency Scalability, Flexibility Compliance* Operative Requirements *Compliance in accordance with regulatory requirements 4
Archive system architecture Archive Sources E-Mails Files ERP Database PACS Paper www Archive Management Connectors and Converters Enterprise Content Management System (Indexing, search, discovery, information management) Archive Storage Archive Storage 5
Backup vs. Archiving Backup For Recovery Data is copied Short-term retention Multiple Versions Compliance usually not required Archiving For long-term retention Data is moved to archive Retention Periods must be enforced over a long lifecycle Compliance usually required Backup is used to protect archived data 6
Agenda Introduction to Archiving Archiving Techniques IBM Archive Storage Options 7
Archive Management functions Archive management performs archiving & retrieval process Selection, collection, ingest Metadata extraction and indexing Information- and Data management Search and retrieve Information management Classification, Indexing & Search Retention and Business process management Information Management Archive Management Data Management Index Data management Access control, auditing Expiration and migration Archive Storage 8
Archiving & retrieval processes Archiving is driven by archive management Select data to be archived from archive source Transfer data into archive management system Extract metadata and store this as index Store data in archive storage archive retrieve Retrieve is drive through archive management Provide search & discovery function to find data o Search client may be integrated in archive source and / or archive management Locate selected data in archive storage using index Read data from archive storage and provide it to requesting application Search and Discovery Archive Management Archive Storage Index 9
Archive Storage functions Archive storage stores and retains data Complements archive management with value adding storage functions Write-Once-Read-Many protection (WORM) High availability and Disaster protection Data management (tiered storage, encryption, compression, deduplication, hybrid cloud etc.) Type of archive storage depends on requirements Archive Management Data Management Archive Storage Index 10
Archive storage techniques Standardize interfaces Future proof, fostering interoperability No vendor lock-in Support of different storage media technologies Increases flexibility and scalability Supports compliance aspects Standard Interface Archiv Storage Tiered Storage Supports cost efficiency Allows migration to new storage media 11
Learn out more about Archiving. Second Revision of Storage Networks explained Second revision includes chapter about archiving, business continuity and FCoE. 2. Edition - September 2009 568 Pages, Hardcover ISBN-10: 0-470-74143-0 http://www.wiley-vch.de/publish/en/books/forthcomingtitles/ee00/0-470-74143-0/?sid=502qbja41e6nfl1sl0qs1v9tc4
Agenda Introduction to Archiving Archiving Techniques IBM Archive Storage Options 13
What is the best archive storage medium Longevity of medium is not the dominating factor Logical and physical migration is inevitable 1 Logical migration: applications, ECM, formats Physical migration: platforms, networks, storage Typical criteria for archive storage Operating cost, scalability, future-proof, compliance Flash Disk Optical Tape Cloud 1 Also see: 100 Year Archive Requirement Survey by SNIA Data Management Forum 14
Why Tape is good for archiving Tape has superior TCO: 3 10 times better cost than disk over 5 10 years Tape has long lifecycle: ~10 years per generation Tape is reliable: With read-after-write and 2-dimensional ECC Tape is storage efficient: With advanced compression Tape is secure: With Encryption and Write Once Read Many (WORM) Tape is standardized: LTO and LTFS Tape has high potential to scale: Bit density on tape can be scaled down 15
Scalabilty of tape April 2015: IBM Research demonstrated a new record of: Visualization of bit cells for different storage techniques 123 Gbits/In 2 in areal data density on magnetic particulate tape LTO-6 has 1.38 Gbits/in 2 At this areal density, a standard LTO size cartridge could store up to 220 terabytes of uncompressed data* 88 times improvement over an LTO-6 16
Total cost of ownership comparison Cost Disk Tape Cost per GB per month 10 /GB/month 0.77 /GB/month Cost per Petabyte per month $100K/Month $7.7K/Month Cost over 5 years $6.6 Million $462 Thousand Third party Independent study Clipper Group Enterprise Strategy Group 17 IBM Confidential
Compliance requirements are key decision criteria for storage What does compliance mean? To comply to laws and regulations Regulations and laws vary by countries and branches There are common laws and regulations in most countries E.g. trade, tax, stock exchange and civil laws demanding data retention Key requirement for archive storage is to prevent delete and changes Write-Once-Read-Many (WORM) Certificates document assessment for compliance Usually not required by legal authorities Helps to manage compliance risks 18
Common compliance requirements Specifies kind of data to be preserved Data Retention periods Write Once Read Many protection No deletion or modification of data during retention time Proof of completeness and authenticy Data access for auditing authorities during retention period Data and system protection (logical and physical) Deletion after expiration Compliance must be assured for the entire archive system 19
Archive storage options - overview Disk only Disk and Tape Cloud For file and object Fast access High change rates Higher cost for large capacities over longer period of times For file and object Fast and slow access Low change rate Lower cost for large capacities over longer period of times Mostly object Medium access Cloud is not a storage technique but an architecture and operating model Cloud can use disk, tape and other techniques 20
Non-compliant IBM storage solutions - Overview Spectrum Protect File & Object Storage Block storage Overview Spectrum Protect Disk File / Object server Filer Disk Disk Block Storage Tape Mgmt. Protocols TSM API NFS, CIFS, Swift, S3 FCP, iscsi Highlights Embedded backup & migration to tape Replication, High availability Most flexible Embedded backup and migration to tape Replication, High availability Most scalable Replication via Disk Most simplistic Infrastructure IBM Systems & Storage Spectrum Protect Storwize V7000 Unified Spectrum Scale Spectrum Archive Spectrum Protect (HSM) Cleversafe IBM Storage IBM Tape 21
Spectrum Protect server Application uses TSM API via LAN for archive & retrieve Spectrum Protect server stores data on storage pools Storage pools can be different storage types Data can be migrated between storage pools Based on age and size Storage pool can also be object storage and cloud Archiving functions Backup of data and metadata Storage tiering Replication using node replication End-to-end encryption via TSM API Deduplication Cloud connection Storage Network Archive application Server TSM TSM API Operating System Flash Disk Tape Cloud Link to Whitepaper 22
Spectrum Scale Application runs on Spectrum Scale node or connects via NAS (NFS, SMB) or object to Spectrum Scale file system App on GPFS client App on NAS client Spectrum Scale file system is available on all cluster nodes Data is stored pools represented by storage devices Data can be migrated between pools Based on flexible policies Connection to object storage and cloud GPFS,NFS,SMB,Object TCP/IP Network Archiving functions Standardized data interfaces (NFS, SMB, S3, Swift) Global name space Built-in backup function to TSM Transparent storage tiering with tape Replication (synchronous and asynchronous) Encryption & compression Native RAID Cloud connector 23 Storage Network Flash Disk Tape Cloud Link to Whitepaper
Spectrum Archive Application runs on Spectrum Scale node or connects via NAS (NFS, SMB) or object to Spectrum Scale file system App on GPFS client App on NAS client Spectrum Archive integrates with Spectrum Scale Facilitates migration and recall of files to LTFS tape Migration controlled by policies GPFS,NFS,SMB,Object TCP/IP Network Archiving functions Transparent storage tiering with flash, disk and tape Standardized data interfaces (NFS, SMB, S3, Swift) Global name space Replication (synchronous and asynchronous) Encryption & compression Native RAID Storage Network Disk Tape Link to Whitepaper 24
IBM Cloud Object Storage - Cleversafe Application connects via object API (S3, Swift, simple object) to Cleversafe Cleversafe Accesser nodes receive data and distribute it on slice stores Leverages Information Dispersal Algorithm (IDA) to slice objects and perform erasure encoding Object slices are stored on Slicestores Provides data availability across locations Can be deployed on-prem, hybrid or off-prem Archive functions Cost efficient with innovative IDA Built in encryption Geographical dispersal Central management Optimized for object storage 25
IBM archive storage solutions provide the best for your workload Leverage Spectrum Scale as central point for tape tiering and hybrid cloud Tape storage tiering Transparent Cloud Tiering Spectrum Archive Colder and long term Archives Cost efficiency with tape Transparency and automation Spectrum Scale Big Data Analytics Unified File (NAS) and Object storage Multi-site file collaboration IBM Cloud Object Storage (Cleversafe) Active Archives Geographic dispersal Object Storage IBM Cloud Object Storage (Cleversafe ) can be used without Spectrum Scale 26
Compliant IBM storage solutions - Overview Spectrum Protect (SSAM) Spectrum Scale immutability WORM Tape Overview SSAM Disk Spectrum Scale Filer Disk Disk Tape Mgmt. Protocols TSM API NFS, CIFS, POSIX FCP, iscsi Highlights Embedded backup & migration to tape Replication, High availability Assessed for compliance Most flexible Embedded backup and migration to tape Replication, High availability Assessed for compliance Most scalable Good streaming performance Tape is green Assessed for compliance More simplistic Infrastructure IBM Systems & Storage Spectrum Protect for Data Retention (SSAM) Spectrum Scale Spectrum Archive Spectrum Protect (HSM) IBM WORM Tape 27
Spectrum Protect for Data Retention (SSAM) - Overview Application uses TSM API via LAN for archive & retrieve SSAM is special version of TSM enriched with immutability features that stores data on storage pools Storage pools can be different storage types Data can be migrated between storage pools Based on age and size Data cannot be deleted prior to expiration Storage pool can also be object storage and cloud Archive application Server SSAM TSM API Operating System Archiving functions Assessed for compliance (link) Built in backup of data and metadata Storage tiering End-to-end encryption via TSM API Deduplication Direct migration path from DR550 and IA Cloud connection 28 Storage Network Flash Disk Tape Cloud Link to Whitepaper
Spectrum Scale Immutability Application runs on Spectrum Scale node or connects via NAS (NFS, SMB) to Spectrum Scale fileset App on GPFS client App on NAS client Spectrum Scale fileset is configured for immutability Allows file retention management with SnapLock like function Encode retention time in last access date and set read-only Can leverage many Spectrum Scale archive functions GPFS,NFS,SMB TCP/IP Network Archiving functions Assessment for compliance is planned in 3Q16 Standardized data interfaces (NFS, SMB, S3, Swift) Global name space Built-in backup function to TSM Transparent storage tiering with tape (Spectrum Archive) Replication (synchronous) Encryption & compression Native RAID Storage Network Flash Disk Tape Link to Whitepaper 29
Summary An archive system is comprised of archive sources, management and storage Key decision criterion for selecting archive storage is compliance IBM provides all components for archiving solutions on-premises, hybrid or in the cloud Compliant and non-compliant IBM archive storage offers value adding functions Better protection with integrated backup and recovery functions Better TCO with integrated storage tiering across different storage technologies Better operations with integrated data migration functions to new storage technologies Better integration with standardized data interfaces Tape helps to optimize cost and provides better protection 30
31
External References Book: Storage Networks explained: http://www.wiley-vch.de/publish/en/books/forthcomingtitles/ee00/0-470-74143-0/?sid=502qbja41e6nfl1sl0qs1v9tc4 SNIA: 10 Year Archive Requirment survey: http://www.snia.org/sites/default/files2/100yratf_archive-requirements-survey_20070619.pdf Total cost of ownership studies for disk and tape storage solutions http://www.clipper.com/research/tcg2013009.pdf http://www.esg-global.com/blogs/active-archival-storage-a-cost-of-ownership-analysis/ SSAM home page: http://www-306.ibm.com/software/tivoli/products/storage-mgr-data-reten/ SSAM 6.3 assessment report by KPMG: http://www.kpmg.de/bescheinigungen/requestreport.aspx?38076 Spectrum Archive Home page: http://www-03.ibm.com/systems/storage/tape/ltfs/index.html IBM Tape home page: http://www-03.ibm.com/systems/storage/tape/ Whitepaper: File archiving solutions with Spectrum Protect for Data Retention: https://www-03.ibm.com/support/techdocs/atsmastr.nsf/webindex/wp100901 Whitepaper: Archiving solutions with Spectrum Archive: https://www-03.ibm.com/support/techdocs/atsmastr.nsf/webindex/wp102643 Whitepaper: Spectrum Scale ILM Policies: https://www-03.ibm.com/support/techdocs/atsmastr.nsf/webindex/wp102642 Whitepaper: Spectrum Protect for Data Retention solutions https://www-03.ibm.com/support/techdocs/atsmastr.nsf/webindex/wp102624 Whitepaper: Spectrum Scale immutability Introduction and Use Cases https://www-03.ibm.com/support/techdocs/atsmastr.nsf/webindex/wp102620 32
Disclaimer Important notes: This information is provided on an "AS IS" basis without warranty of any kind, express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Some jurisdictions do not allow disclaimers of express or implied warranties in certain transactions; therefore, this statement may not apply to you. IBM reserves the right to change product specifications and offerings at any time without notice. This publication could include technical inaccuracies or typographical errors. References herein to IBM products and services do not imply that IBM intends to make them available in all countries. IBM makes no warranties, express or implied, regarding non-ibm products and services, and any implied warranties of merchantability and fitness for a particular purpose. IBM makes no representations or warranties with respect to non-ibm products. Warranty, service and support for non-ibm products is provided directly to you by the third party, not IBM. When referring to storage capacity, GB stands for one billion bytes; accessible capacity may be less. Maximum internal hard disk drive capacities assume the replacement of any standard hard disk drives and the population of all hard disk drive bays with the largest currently supported drives available from IBM. IBM Information and Trademarks The following terms are trademarks or registered trademarks of the IBM Corporation in the United States or other countries or both: the e-business logo, IBM, system x, system p, System Storage SnapLock is a registered trademark of Network Appliance Corporation in the United States and other countries Intel, Pentium 4 and Xeon are trademarks or registered trademarks of Intel Corporation. Microsoft Windows is a trademark or registered trademark of Microsoft Corporation. Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. Acknowledgements Thanks to Tom Clark (Chief Architect Storage Software), Frank Kraemer (Client Technical Architect, IBM Germany), Ulf Troppens (Spectrum Scale Development) for the valuable feedback shaping this presentation 33