Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014 Gideon Senderov Director, Advanced Storage Products NEC Corporation of America
Long-Term Data in the Data Center (EB) 140 120 100 80 60 40 20 Consumption of Enterprise Disk Capacity by Type 83% CAGR 46.0% 47.5% 28.8% 0 2011 2012 2013 2014 2015 2016 Content Depots & Cloud Unstructured data Replicated data Structured data 22.5% Source: IDC, November 2012 Page 2 NEC Corporation of America 2014
PBBA Market Forecast (CY2012 - CY2017) Market is growing fast. (Source: IDC) Page 3 NEC Corporation of America 2014
Archive Market Trend Expected rapid growth in disk based archive market due to wide usage of active archive, growing with CAGR 35%, and reach $5.7B WW in 2016 File- and Object-Based Storage (FOBS) market exceeding $23B in 2013, projected to grow with CAGR 25% and reach $38B WW in 2017 (IDC, August 2013) WW Disk based Archive sales forecast 6,000 5,000 Sales revenue ($M) 4,000 3,000 2,000 1,000 CAGR 35% 0 2010 2011 2012 2013 2014 2015 2016 Source: IDC, June 2013 Page 4 NEC Corporation of America 2014
Storage Infrastructure Spending Page 5 NEC Corporation of America 2014
Scale-out Object Storage Object Storage A storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy and block storage which manages data as blocks within sectors and tracks Object storage systems allow relatively inexpensive, scalable and selfhealing retention of massive amounts of unstructured data. Source: Wikipedia, April 2014 Page 6 NEC Corporation of America 2014
Legacy Solutions Scalability Limitations Inadequate scalability of capacity & performance Cannot scale performance to keep up with data growth Multiple products with different architectures More siloed capacity to manage Limited deduplication scope Limited scalability proliferates duplicate data across appliances Lower deduplication ratio for large environments Page 7 NEC Corporation of America 2014
Local vs. Global Deduplication Single deduplication repository across entire solution Data deduplication across ALL data from ALL node Cross-node deduplication for greater efficiency Cross application deduplication leveraging application-aware deduplication Page 8 NEC Corporation of America 2014
Legacy Solutions Resiliency Limitations Day 1: Full 1 7 1 4 4 6 2 1 Day 2: Incremental 1 6 6 3 Day 3: Incremental 5 1 1 6 4 7 1 6 Day 7: Full 1 4 7 8 Q: What data can be restored if block #1 lost? A: NONE! 1 2 3 4 5 6 7 8 Traditional RAID is not sufficient for deduplicated data Page 9 NEC Corporation of America 2014
Scale-out Scalable Grid Storage Architecture Community of Smart Nodes Accelerator Nodes Storage Nodes Hybrid Nodes Nodes Industry-standard servers Multiple types allowed Heterogeneous & open Unrestricted Scalability PERFORMANCE SCALABILITY CAPACITY SCALABILITY Intelligent Management SW Fully distributed system Self-aware & self-organizing Data management services Virtualizes hardware platform Page 10 NEC Corporation of America 2014
Adaptive Grid Storage For Long-Term Data Online Upgrade/Expansion with Multi-generation Nodes Non-disruptive Remove Non-disruptive Add/Upgrade V2 Grid + V3 + V4 + Vx = 1 System Enable in-place technology refresh with no data migration Ever greener storage with faster, denser components Enable continuous data availability Reduce CapEx and OpEx with deduplication Non-disruptive scaling from dynamic auto provisioning storage Page 11 NEC Corporation of America 2014
Hands-Free Management Simple, fast deployment < 45 minutes to backup & archive Self-discovering capacity No storage provisioning No tape emulation tasks Self-tuning and Resource Management Optimized performance & capacity Self-healing Automatic recovery across resources Web-browser GUI Monitoring and Planning Page 12 NEC Corporation of America 2014
Advanced Erasure-Coded Data Resiliency User dialable disk/node protection Default protection against 3 concurrent failures Dynamically allocated Intermix of multiple resiliency levels (1-6) for different applications Greater protection with lower overhead Default setting: 25% capacity overhead 1.5x greater protection than traditional RAID 6 with lower overhead and faster recovery No idle spare drives Faster self-healing with less performance degradation Only data is reconstructed rather than entire drive Data is reconstructed across multiple spindles Page 13 NEC Corporation of America 2014
Lab Benchmark Test Page 14 NEC Corporation of America 2014
Scale-out vs. Scale-up Data Deduplication Controller AN AN Shelf Shelf Shelf Shelf SN SN SN SN Single or Multi-Controller Modular scale-out front-end Fixed maximum throughput Scalable maximum throughput Scalability limited by controller physical capabilities Scalability independent of physical capabilities with linear performance Page 15 NEC Corporation of America 2014
Scale-out Inline Global Data Deduplication Distributed Two-Tier Architecture Independent linear scalability of performance & capacity AN AN Global Deduplication Data deduplication across ALL data from ALL nodes SN SN SN SN Distributed Hash Table Data routed to responsible Storage Node Deduplication & hash table processing scales linearly with Storage Nodes Prevents silos of deduped data Page 16 NEC Corporation of America 2014
Linear Performance Scalability Page 17 NEC Corporation of America 2014
Scale-out Deduplication Performance Page 18 NEC Corporation of America 2014
Application Metadata Impact Clients Files Agent Operation File Aggregation (tar) Blocking Backup Server Storage Media Metadata Filtering Application metadata makes user data appear different Inserted metadata reduces deduplication efficiency Page 19 NEC Corporation of America 2014
Application-Aware Deduplication Capacity Optimization through Enhanced Deduplication Generic Dedupe Data (Bytes) Application-Aware Dedupe Time (Weeks) Original Data Generic Dedupe Application-Aware Dedupe Application-aware deduplication leverages format awareness to filter metadata inserted by the application and deduplicate the data payload separately Application-Awareness can increase Reduction Ratio by 130% or more Page 20 NEC Corporation of America 2014
Disaster Recovery and Cloud WAN-optimized Replication Geo-Distributed Grid Site A Replication Site B Site A Site C Site B Page 21 21 NEC Corporation of America 2014
WAN-Optimized Replication Accelerator Nodes Storage Nodes Many to one replication Many to many replication Asynchronous grid-to-grid WAN-optimized replication for DR Deduplication across all replicated HYDRAstor grids Minimizes network bandwidth requirements Minimizes DR site capacity requirements Policy-based data selection File System Granularity In-flight encryption Page 22 NEC Corporation of America 2014
Customer Configuration Example Replication across primary and secondary datacenter grid systems Replication from remote sites within the U.S., and across the Atlantic and the Pacific Page 23 NEC Corporation of America 2014
HYDRAstor Gen-4 Family Performance Single Node Model HS3-410 3.2TB/hr, 3.6TB/hr (OST) 8-24TB Raw 104-312TB Effective HS8-4001 4.9TB/hr, 6.3TB/hr (OST) 12-48TB Raw 156-624TB Effective Page 24 NEC Corporation of America 2014 HS8-4002-96 9.7TB/hr, 12.6TB/hr (OST) 96TB Raw 1.2PB Effective HS8-4001-96 4.9TB/hr, 5.6TB/hr (OST) 96TB Raw 1.2PB Effective Grid Model HS8-4002-192 9.7TB/hr, 12.6TB/hr (OST) 192TB Raw 2.5PB Effective HS8-4006-720 29.2TB/hr, 37.8TB/hr (OST) 720TB Raw 9.4PB Effective Capacity HS8-4165-7920 802TB/hr, 1,040TB/hr (OST) 7,920TB Raw 103PB Effective Node Building Blocks Hybrid Node Storage Node
Common Code Base and Functionality Common code/features and modular scalability across models Intermix of multi-generation nodes in the same grids All Software Features supported for entire product line DRD protection Advanced erasure-coded data resiliency DataRedux Inline application-aware global data deduplication and compression Cloning/Snapshot Instant fully deduplicated file system or file R/W copy Dynamic Data Shredding Data shredding for deleted classified data AN Failover Front-end failover for High Availability RepliGrid (Option) WAN-optimized replication with in-flight encryption HYDRAlock (Option) WORM file system functionality Encryption at Rest (Option) Encryption to protect data at rest HYDRAstor OpenStorage Suite (Options) Express I/O Lightweight data transport for high throughput Dynamic I/O Adaptive I/O load balancing across nodes Optimized Copy WAN-optimized copy services Optimized Synthetics Storage-synthesized full backups Page 25 NEC Corporation of America 2014
Scale-out Storage for Long-Term Data 1 PB/hr 100 PB Page 26 NEC Corporation of America 2014
Potential Enhancement Directions Performance Functionality Page 27 27 NEC Corporation of America 2014