Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC
Files vs. Object File Metadata: Name: Picture.jpg Path: /mnt/pictures Owner: Mr. Bean Created Date: 23.8.17 13:38 Modified Date: 28.8.17 08:38 Permission: 750 Object Metadata: Name: Picture.jpg Owner: Mr. Bean Date: 23.8.17 13:38 Resolution: 512 418 Location GPS: 50.0481661N, 14.4084219E Date Taken: 21.8.17 14:35 Camera Type: Nikon D750 Client Number: 56672146576 Keyword 1: restaurant Keyword 2: serbian Keyword 3: exterior 2
NAS storage vs. Object storage File storage Unstructured data in hierarchical structure - folder structure Hierarchy needs to be maintained Throughput oriented platform Often accessed content Limited metadata attributes Object storage No hierarchy to maintain Unlimited scalability Transaction oriented platform Rich Metadata stored with the objects Good for infrequently changing data Focused on cost 3
What is the object storage? Cloakroom for application data Object = File + advanced metadata Application put the data in a object storage and receives the ID or address of the object. Application Only need to store the IDs of all stored objects in an internal database No need to implement block (FC, iscsi) or file (NFS, SMB, FTP) protocol just access API over HTTP No filesystem hierarchy management and maintenance 4
Object storage? And so what? How it can help me? Money $$$: Lower price per TB to store infrequently access data Huge number of the objects: ~100M - ~10B objects per system no problem Geo-distributed: replication between two sites? Hmm ok, but what about 3 and more sites? Tagging the objects need to store advanced metadata without separated SQL New application support Ask your developers if they want to implement FC, iscsi, NFS or SMB. What about simple access to the app repository over http(s) and application talking to API? Scalability and durability Easy to scale, sky is the limit, architecture for EB 5
Object platform use case Traditional workloads Archive platform archive/long-term archive target with lower price/tb Centera technology refresh with ECS Cloud-tier (regardless on-prem/off-prem) e.g. CloudPool of Isilon, DataDomain Global geo-distributed repository - multi-geo-site solution Next gen file-sharing Enterprise Sync-Share like Dropbox/Drive etc. Tape Library replacement due to lack of agility, high operational costs, low reliability Emerging workloads New application development agile dev-ops: data storage = S3 target no other way IoT machina data 6
Next-Gen Sync & Share solution 7
Next-Gen Sync & Share solution 8
Dell EMC Object Evolution 2002 API (Proprietary) Single namespace Immutable Content Unstructured Content 2007 Rest API Geo replication Global namespace Multi-tenant Unstructured Data 2013 Universal API Support Highly efficient geo storage Small and large data Comprehensive data storage: Unstructured Semi-structured 9
ECS Software Architecture ECS Software NFS STORAGE OBJECT STORAGE HDFS STORAGE Geo-Replicated Data Protection Active-Active read/write support with strong consistency No single point of failure Performance and efficiency for small and large objects SITE 1 SITE 2 SITE 3 10
Multi-Protocol Support S3 interface for compatibility with Amazon interface HCFS interface for efficient analytics Drop-in replacement for OpenStack Swift Share data with Windows clients Share data with NFS clients Upgrade from Centera and Atmos 11
CIFS-ECS (GeoDrive) Features Windows Access (Local or CIFS) to ECS S3 API Write and Read Caching Metadata Tags and Search UI Multipart upload and download Key Benefits Ingest data in native windows format Requires no change on the application level, accelerating the move to an object platform Retention & Versioning Life Cycle ACL translation Client side load balancing 12
Metadata Search Built In! Features Search across user-defined & system level meta-data Enabled at bucket level Support for <,>,<=,>=,=,!=, AND/OR Key Benefits No need for a separate database, metadata search is built-in. Inline meta-data, enabling instant search once data is written Sort capabilities Specify attributes to be returned Pagination support and ability to specify limits for search results 13
Migration to ECS? Yes, please. ECS-SYNC Features File System to S3 S3 to S3 CAS to CAS Key Benefits Ingest data in native format or Transform Very high performance Ability to Schedule and Monitor Multi-threading Multipart upload Client side load balancing XXXXX 14
DataFrameworks - ClarityNow acquisition Keys to Coexistence: File & Object Index & Search scan file/object platforms storage or cloud environment (Scalability: PBs capacity, # of files >10^9) Consolidated view of all volumes/fs/buckets etc. Index of metadata in a in-memory database fast result Environment 6.1 Billion Files: 15.1 PiB Search finished in 16 seconds Data Visibility - Business reporting Chargeback & Showback By Project Cleanup Terms 15
DataFrameworks ClarityNow Keys to Coexistence: File & Object File-to-Object Object-to-File Seamless integration and Movement Self-Service Archive and Restore & Cost Management Can be automated based on defined policy 16
Thank you Martin Lenk, Specialist Senior System Engineer +420 777 770 300 martin.lenk@dell.com cz.linkedin.com/in/lenkm 17
ECS Gen3 Appliance EX300 12TB EX300 24TB EX300 48TB EX300 96TB EX3000 1NODE 12TB EX3000 2NODE 12TB Model EX300 EX300 EX300 EX300 EX3000 EX3000 1. CPU 8 core 8 core 8 core 8 core 2 x 6 core 2 x 6 core Memory per node Boot per node 64 GB 64 GB 64 GB 64 GB 64 GB 64 GB 1 x 480 GB BOSS 1 x 480 GB BOSS 1 x 480 GB BOSS 1 x 480 GB BOSS Disk per node 12 x 1 TB 12 x 2 TB 12 x 4 TB 12 x 8 TB 45, 60, 90 x 12TB 1 x 480 GB* 1 x 480 GB* 30, 45 x 12TB Disk type HDD HDD HDD HDD HDD HDD Rack cap min 60 TB 120 TB 240 TB 480 TB 2,700 TB 2,160 TB Rack cap max 216 TB 432 TB 864 TB 1,728 TB 8,640 TB 8,640 TB Networking connections 2 x 10 GbE* 2 x 10 GbE 2 x 10 GbE* 2 x 10 GbE 2 x 10 GbE* 2 x 10 GbE 2 x 10 GbE* 2 x 10 GbE 2 x 25 GbE 2 x 25 GbE 2 x 25 GbE 2 x 25 GbE 18