Atrato SOLVE - Scalable Offload Logical Volume Engine Dr. Sam Siewert 1
Atrato Velocity Series Atrato Virtualization Software (AVS) New Levels of Storage Virtualization Application intelligence and autonomics Industry s first automatic tiering application aware Data Access Controller Highly redundant/available configurations Seamless integration into existing data centers Host for ApplicationSmart SAID - Data Enclosure Highly parallel access high speed Extremely dense high capacity Architecture scales with new technologies Self-maintaining, low power 2
Atrato Integration Options SAN (Storage Area Network) NAS (Network Attached Storage) Servers NAS Clients Block Level Commands SAN NAS File Level Commands (NFS, CIFS) NAS Servers SAN Block Level Commands Storage Storage Data / Content Access slow Storage Access Cost too high Content search too slow Business Problems We Solve Security/Encryption - unsecure Scalability expensive Maintenance & Power too costly 3
Atrato Delivers Data Center Value Extreme Capacity, Performance Density Highest Disk Density Per Rack 8x Acceleration of Tier 0 and Tier 1 Up to 100x Lower Latency Optional Tiering Down for Capacity Simplified, Intelligent Management Self-Healing Self-Maintaining Self-Optimizing Automatic Tiering Aligns Resources to Business Needs Frees IT staff Lowest TCO Most Green 50%-80% Less Power & Cooling 75% Less Data Center Space No Overbuying Reduced Maintenance Costs
Atrato Storage Markets Digital Media High Performance Digital Studio Pre / Post Production Financial Analysis Monte Carlo Social Networks Web 2.0 Enterprise Database / Microsoft Exchange IPTV / VOD Oil & Gas Exploration Video Surveillance Analytics
ApplicationSmart Tiered Management Integration with NAS FusionIO 6
New Data Center Storage ApplicationSmart Self-Optimization Storage Cache Problem: Managed by OS, tracks data access to prioritize order its cache directory Too Small GB, need TB Systems running cache Static data only Limited memory Traditional Storage Arrays Add More Storage Spindles Missed Penalty, Too High Over provisioning Increase rack space Added power & cooling costs Linear Cache Imprecise, Not Adaptive No random access Expensive Cache misses slow access ApplicationSmart replaces the need for Cache: ApplicationSmart manages cacheable data in real-time Data Storage Random Data Access Atrato V1000 If Data is Completely Random Identify Cacheability by Initiator and Application, Compute Speed-up Terascale SSD SAID Petascale Profile Monitors, Access Change Detection and Adaptation, Multimodal Tier 0 Replication (Intelligent Block Mgr) 7
Self-Optimizing Intelligence ApplicationSmart Access Profiler Egress Accelerator SLM (SSD LUN Manager) Adaptive histogram, highly compressed, scales to petabytes Accelerates IO for high access content Detector for sequential/random initiator streams Adjustment for read-ahead cache Full AVS VLUN creation and management SSD storage pool, data lifetime protection options AVS included features TME (Tiered Management Engine) Dynamic block migration with access pattern change Mix profiling: block level and file level, very precise Ingest Accelerator Tuned for RAID access (FIFO with back-end IO reforming) Lower latency, higher throughput, higher access Upgrade features 8
Spectrum of Workloads and ApplicationSmart Acceleration Sequential Hot-Spots Random egress IO read-ahead Ingest IO reforming Fully Predictable (SLC/RAM FIFOs) Semi-Predictable (Scalable MLC Flash) Non-Cacheable (Solved by SAID spindle density)
How Is ApplicationSmart Different? Actual VoD Application with Atrato ApplicationSmart Cache Data Profiler Histogram Analysis Identifies access hot-spots Notes when access changes are statistically significant Mapping integrates with virtualization engine Histogram Groupings Accelerates IO for high access content Replicates blocks when statistically significant Virtualization allows for real-time updates End user customizable 10
SAN-Scalable JBOF (Petascale) Infiniband or 10G FCoE/iSCSI Scaling 20 Gbps Minimum 2TB Per Unit, Scaled on SAN Rather Than Controller Infinite Linear Tier-0 Scaling with V1000 s Mellanox 2 port, 20G JBOF 2TB, 1RU SAN AVS LSI MegaRAID 8888 Mellanox 2 port, 20G JBOF 2TB, 1RU AVS LSI MegaRAID 8888
Ingest FIFO IO Reforming Tier0 Array 2115K Single IO 2115K Single IO RAID0 SATA ROC Single IO Completion AVE Single IO Completion 33 rd Single 67K IO 16x128K Threaded IOs 16x128K Threaded IOs SAID
V1000 Random & Sequetial IO Performance Measured (50TB SAID) IOs/sec 22000 21000 20000 19000 18000 17000 16000 15000 14000 13000 12000 11000 10000 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 DL580 G5 Controller, 1SAID with 320GB, 7200RPM HDDs, AVS2.0, R10, aio-stress, 11/13/08 1024 512 256 128 64 32 16 8 4 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 1100 1000 900 800 700 600 500 400 300 200 100 0 MBytes/sec 256 1870 IO Transfer Size (KBytes) Sum of IOPS - rnd - read Sum of IOPS - rnd - wrt Sum of IOPS - seq - read Sum of IOPS - seq - wrt Sum of MB/s - rnd - read Sum of MB/s - rnd - wrt Sum of MB/s - seq - read Sum of MB/s - seq - wrt 13
Scaled-out 10TB SSD + 20TB HDD Test Configuration 21U Total 5 Controllers 5 25 SSD JBOFs 10 SSDs/Controller 1 SAID 320 20 10GE Ports 14
Why this is Interesting to HPC Scalable Meta-Data Solid-State and/or Hybrid VLUNs TSM for Deep Archive Oracle RAC Linux Index/Log Meta-data IBRIX Meta-data Cluster, GPFS, Lustre MDS Scalable HDD and/or Hybrid VLUNs DB Store Lustre OSTs IBRIX, GPFS, Exanet Volumes SAN Scalable SSD and HDD FC, Infiniband (SRP), 10GE iscsi TOE Solid-State and HDD Storage Pools with Unified VLUN Management All SSD, Hybrid SSD+HDD, All HDD Co-Managed Linear Infinite Scaling Through SAN and NAS Volume Managers and Servers Can Surface a VLUN Locally on a Controller Node for Integrated NAS/SAN OEM Integration Only NAS Partner Integration for Fully Featured NAS or Parallel File Systems 15
Hybrid Integration of SSDs drive IOPS and expand the performance envelope IOPs 400 K MB/sec 20 K 2 GB/s HDD SSD Bandwidth Block Size 16
ACTUAL Performance of Hybrid VLUN from 2U SSD and SAID SSD+HDD Hybrid VLUN Performance Synergy 450000 2500 400000 350000 2000 IOPs 300000 250000 200000 150000 1500 1000 IOPs required HDD IOPs SSD IOPs HDD BW (MB/sec) SSD BW (MB/sec) Hybrid BW 100000 500 50000 0 0 50 100 150 200 250 IO Request Size (KBytes) 0 17
1M IOPs Test install at MS 2U Server: Peak 710K IOPs (512b Random Read) off ONE iscsi initiator Saturated 2 Quad CPUs using 2 10G iscsi Intel X- 520 links Back-end at 1/5 th loading possible (3.5M IOPs back-end test) 4U Server Test (TBD): 3650 M2 with Six-Core Westmere 3850 M3 3.2Ghz Quad- Socket Nehalem/Tylersburg Expect to hit 1.4M IOPs off ONE initiator 18
Stressing Windows Capability Win2k8 R2 SINGLE Initiator Near 1M IOPs 19
The Bottom Line -- Hybrid Storage Delivers the Flexibility to Solve Problems Fundamental Storage Customer Requirements Performance Capacity Scalability Cost Performance RAM Scaling $$$$$ SSD Scaling $$$ Add HDD Back-end SSD + HDD Scaling $$ Add SSD TME / IA Add RAM TME / EA HDD Scaling $ Scalability 20
Virtualization Engine Tiered Ingest/Egress Details IO request Customer Initiator Front-End IO Interface (SCSI Target Mode Transport and Processing) Virtualization Engine IO Request Interface ITL-Nexus IO-mapper Tier Manager Tier-0 Analyzer Tier-1 Analyzer Ingest IO Reforming Egress IO Read-ahead RAID-10 Mapping RAID-50 Mapping Back-end IO Ingest Egress FIFO VLUN1 Ingest Egress FIFO VLUN-n Tier-0 Cache VLUN2 VLUN1 VLUN2 VLUN-n SSD JBOF HDD SAID
Virtualization Engine Tiered Cache Write-Back on Read Details IO request Customer Initiator Front-End IO Interface (SCSI Target Mode Transport and Processing) Virtualization Engine IO Request Interface ITL-Nexus IO-mapper Tier Manager Tier-0 Analyzer Tier-1 Analyzer Ingest IO Reforming Egress IO Read-ahead Back-end IO RAID-10 Mapping RAID-50 Mapping Ingest Egress FIFO VLUN1 Ingest Egress FIFO VLUN-n Tier-0 Cache VLUN2 VLUN1 VLUN2 VLUN-n SSD JBOF HDD SAID
Virtualization Engine Tiered Cache Write-Through Details IO request Customer Initiator Front-End IO Interface (SCSI Target Mode Transport and Processing) Virtualization Engine IO Request Interface ITL-Nexus IO-mapper Tier Manager Tier-0 Analyzer Tier-1 Analyzer Ingest IO Reforming Egress IO Read-ahead RAID-10 Mapping RAID-50 Mapping Back-end IO Ingest Egress FIFO VLUN1 Ingest Egress FIFO VLUN-n Tier-0 Cache VLUN2 VLUN1 VLUN2 VLUN-n SSD JBOF HDD SAID
Virtualization Engine Tiered Cache Read-Hit Details IO request Customer Initiator Front-End IO Interface (SCSI Target Mode Transport and Processing) Virtualization Engine IO Request Interface ITL-Nexus IO-mapper Tier Manager Tier-0 Analyzer Tier-1 Analyzer Ingest IO Reforming Egress IO Read-ahead RAID-10 Mapping RAID-50 Mapping Back-end IO Ingest Egress FIFO VLUN1 Ingest Egress FIFO VLUN-n Tier-0 Cache VLUN2 VLUN1 VLUN2 VLUN-n SSD JBOF HDD SAID
Virtualization Engine Tiered Cache Read-Miss Details IO request Customer Initiator Front-End IO Interface (SCSI Target Mode Transport and Processing) Virtualization Engine IO Request Interface ITL-Nexus IO-mapper Tier Manager Tier-0 Analyzer Tier-1 Analyzer Ingest IO Reforming Egress IO Read-ahead RAID-10 Mapping RAID-50 Mapping Back-end IO Ingest Egress FIFO VLUN1 Ingest Egress FIFO VLUN-n Tier-0 Cache VLUN2 VLUN1 VLUN2 VLUN-n SSD JBOF HDD SAID
Histogram Resolution Example Level-1 Hot-Spot(s) 128 MB Cells Full Capacity HDD FV Counter Array 700 410 (e.g. 4.5MB) Top 10% FV Counter Array 170 170 (e.g. 7.5GB) Level-2 Hot-Spot(s) Background Migration Process (AVE Region/Block Mapping Update) SSD 5% Block Cache SSD FV Counter Array Eviction Candidates Note: Hot-Spot Regions Not Likely to Be Contiguous, so diagram is figurative
FV Stability Formulation Num Bins Fv Size Fv Dimension i, Fv t1 [ i] i, Fv[ i] Shape j< ( i( Fv size)) + Fv Size abs( Fv i< Fv Size i 0 j ( i( Fv size)) Fv t 2 t 2 Bin[ j] Total Samples [ i] Fv 2.0 t1 t1 [ i] Fv [ i]) t1 [ i] FVSizenumber of counters lumped in dimension NumBinsTotal counters or number of regions FVDimensionnumber of elements in vector Summation of Normalized Histogram taken at epoch t1, Fv < 1.0 Fv Change between epoch t2 and t1, where DFv < 1.0 0.0 DShape 1.0 ΔFV0.0 No Shape Change ΔFV1.0 Max Shape Change - Unstable
Hash Mapping Example Bitmap Array to Indicate Whether Region has Tier-0 Cached Data (Ptryes, Null Ptrnone) 700 128 MB Cells 410
2.2 sec) 10000 (0.4 sec) 800 (0.6 sec 10000 ) ) ((1 ) ( 0.6 1.0 0 0 0 0 ] [ ] [ 0 0) ] [ ( 0 1) ( 0 1) 0 ( 0 1) ( 0 + + + > µ µ µ up speed latency HDD ave rate hit latency SSD ave rate hit latency HDD ave up speed T T T up speed efficiency tier fit access tier rate hit IOs sorted total IOs hosted tier fit access tier i counts access sorted IOs sorted total i counts access sorted IOs hosted tier size set LBA i counts access sorted evaluate size LBA tier miss HDD hit SSD only HDD counts access sorted sizeof i sets LBA tier i counts access sorted sizeof i