for HPC, HPDA and Machine Learning (ML) Frank Kraemer, IBM Systems Architect mailto:kraemerf@de.ibm.com
IBM Data Management for Autonomous Driving (AD) significantly increase development efficiency by reducing manual efforts for video tagging, eliminated wasted time for data search and manual data copy/move processes and by automating workflows significantly increase test through-put, allowing you to run more test cases in less time, therefore increasing time-tomarket as well as the quality of your camera and ADAS products to reduce IT costs for local storage hardware by globally centralizing data increase the entire flexibility through the ability to move work-load from one place to another guarantee long-year data verifiability and recoverability of test data via archiving
DESY High Performance Computing with Data
Introducing IBM Spectrum Scale Highly scalable high-performance unified storage for files and objects with integrated analytics Remove data-related bottlenecks Demonstrated 400 GB/s throughput, building to 2.5TB/s Local caching for Read and Write Enable global collaboration Data Lake serving HDFS, files & object across sites Multi-cluster configurations; Sync & Async Optimize cost and performance Up to 90% cost savings & 6x flash acceleration Transparently Tier to Cloud Ensure data availability, integrity and security End-to-end checksum, Spectrum Scale RAID, NIST/FIPS certification Compression, Encryption, Audit Logging
The History of Spectrum Scale * Gartner, Magic Quadrant for Distributed File Systems and Object, 20 October 2016, Document No. G00307798
IBM for unstructured data Infrastructure requirement Scalability Flexibility Agility New gen workloads Performance Cloud Object Global IBM answer Parallel File System Software Defined Unified HDFS connector Parallel File system OpenStack integration & Transparent Cloud Tiering Unified AFM for multi-cluster storage
Spectrum Scale: The flexible cognitive Solution HPC & AI Client workstations Big Data Analytics Compute farm Users and applications Global name space OpenStack Container POSIX NFS SMB/CIFS Object HDFS Controller Cinder Glance Swift Manila Docker Kubernetes Site B Site A IBM Spectrum Scale Automated data placement and data migration On/Off Premise Site C Flash Disk Tape Rich Servers Transparent Cloud Tiering Cloud Data Sharing Users and applications
IBM Spectrum Scale performance features Performance Leadership for large and small files text Highly Available Write Cache (HAWC) Improves performance of small synchronous writes Small synch writes are written to the log. As log fills, rewrite to home Local Read Only Cache (LROC) Extend the page pool memory to include local DAS/SSD for read caching Compression Compress what makes sense & extends to cache Quality of Service Throttle background functions such as rebuild or async replication Set by flexible policy, such as day-of-week and time-of-day Distributed and flash accelerated metadata Metadata includes directories, inodes, indirect blocks Lift data to the highest tiers based on the file s heat Automate workload pipelines with Spectrum Computing LSF
Advanced File Management (AFM) Tie together multiple clusters to serve users across the globe Spans geographic distance and unreliable networks Caches local copies of data distributed to one or more Spectrum Scale clusters Low latency local read and write performance As data is written or modified at one location, all other locations see that same data Efficient data transfers over wide area network (WAN) Speeds data access to collaborators and resources around the world Unifies heterogeneous remote storage Asynchronous DR is a special case of AFM Bidirectional awareness for Fail-over & Fail-back with data integrity Recovery Point Objectives for volume & application consistency
HPC Performance with simplified user access Transparent Tiering & Data Migration Analyze and Archive In-Place Enterprise HPC with Flash for performance Network Shared Disk for modular scaling Tier data based upon policy, users actions or workflow Lower economics with tape, object, or cloud 2 nd Site Data always available to end-users Auto-migrate to higher tiers Full data, not stubs Global namespace extends across physical storage and multiple sites NFS/SMB Cluster System pool (Flash) Gold pool (Disk) General High capacity tiered NAS with fast data ingest/retention/share and long term retention Deployed today in multiple clients 1 0 Tape Library
Unified File & Object + HDFS Store everywhere. Run anywhere. text Challenge: Object storage for data & cloud Seamless scaling RESTful data access Object metadata replaces hierarchy IBM Spectrum Scale Swift & S3 High-performance for object Native OpenStack Swift support w/ S3 File or object in; Object or file out Enterprise data protection & Features Full OpenStack cloud support Cinder (block), Manilla (file), Swift (object)
Analytics without complexity Store everywhere. Run anywhere. text Challenge: Separate storage systems for ingest, analysis, results HDFS requires locality aware storage (namenode) Data transfer slows time to results Different frameworks & analytics tools use data differently Raw Data Ingest Analysis HDFS Transparency Map/Reduce on shared, or shared nothing storage No waiting for data transfer between storage systems Immediately share results Single Data Lake for all applications Enterprise data management Archive and Analysis in-place Direct Access File Object POSIX
Comparing HDFS v. IBM Spectrum Scale Preliminary Results Presented SC16 U/G 20 nodes (compute & storage) Cloudera Map/Reduce Compute the average temperature for every grid point (x, y, and z) Vary by the total number of years MERRA Monthly Means (Reanalysis) Comparison of serial c-code to MapReduce code Comparison of traditional HDFS (Hadoop) where data is sequenced (modified) with GPFS where data is native NetCDF (unmodified, copy) Using unmodified data in GPFS with MapReduce is the fastest Only showing GPFS results to compare against HDFS DASS Initial Serial Performance http://files.gpfsug.org/presentations/2016/sc16/06_-_carrie_spear_-_spectrum_sclale_and_hdfs.pdf
IBM Spectrum Scale: Transparent Cloud Tiering Single namespace and control of data placement for hybrid cloud text Intelligent data placement On or off-premises objects Policy driven tiering Managed data placement or migration of cold data Automated data movement Recall on user demand IBM Spectrum Scale High-performance Single namespace Unified file, object and HDFS Encrypted Secure data in cloud
IBM Spectrum Scale: Cloud Data Sharing Policy-driven data movement for hybrid cloud text Managed data sharing Policy driven replication and synchronization Granular control: Type, action, metadata or heat Bridging cloud and file -to-storage Data and metadata Automated data movement Secure, reliable connection High-speed and scalable Clustered configurations IBM Spectrum Scale High-performance file, object and HDFS Clustered, tiered and scalable Bridge legacy applications and new workloads Cloud storage Cloud native applications Dev/Ops development New workloads
IBM Spectrum Scale Features and Benefits management at scale Store everywhere. Run anywhere. Improve data economics Software Defined Open Platform Simplified, self-tuning options New GUI & health monitoring Unified File, Object & HDFS Distributed metadata & high-speed scanning QoS management 1 Billion Files & yottabytes of data Multi-cluster and system management integration with IBM Spectrum Control Advanced routing with latency awareness Read or Write Caching Active File Management for WAN deployments File Placement Optimization End-to-end data integrity Snapshots Sync or Async DR zlinux support Tier seamlessly Incorporate and share flash Policy driven compression Data protection with erasure code and replication Native Encryption and Secure Erase compliance Target object store and cloud Leading performance for Backup and Archive Heterogeneous commodity storage: Flash, disk & tape Software, appliance or Cloud Data driven migration to practically any target File/Object In/Out with OpenStack SWIFT & S3 Transparent native HDFS Integration with cloud
New Generation Performance and Capacity 25 GB/s 36 GB/s New! Model GL6S: 6 Enclosures, 34U 502 NL-SAS, 2 SSD Spectrum Scale Announced on April 11, 2017 New all Flash options in Q3 17 GB/s 24 GB/s New! Model GL4S: 4 Enclosures, 24U 334 NL-SAS, 2 SSD Model GL6: 6 Enclosures, 28U 348 NL-SAS, 2 SSD 8 GB/s Model GL2: 2 Enclosures, 12U 116 NL-SAS, 2 SSD 12 GB/s New! Model GL2S: 2 Enclosures, 14U 166 NL-SAS, 2 SSD Model GL4: 4 Enclosures, 20U 232 NL-SAS, 2 SSD Max:.9PB raw Max: 1.6PB raw Max: 1.8PB raw Max: 3.3PB raw Max: 2.8PB raw Max: 5PB raw 17
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 9 16 17 System x3650 M4 System x3650 M4 EXP3524 8 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 9 16 17 16 17 System x3650 M4 System x3650 M4 EXP3524 EXP3524 8 9 8 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 9 8 9 16 17 16 17 16 17 16 17 System x3650 M4 System x3650 M4 EXP3524 EXP3524 EXP3524 EXP3524 All Flash : All Flash Performance *NEW* Model GS1S 24 SSD All Flash Speed Model GS2S 48 SSD Drives Model GS4S 96 SSD Drives Capacity calculations are based on 15.36 TB SSD s Performance Numbers are based on 4MB blocksize and 100% Reads without any cache hits. Writes are typically 20% less than Read Performance 368 Raw TB 14 GB/sec 736 Raw TB 26 GB/sec 1472 Raw TB 40 GB/sec All numbers are based on 100 Gb EDR (4 ports connected per Node) 18
IBM Software-Defined Portfolio IBM Spectrum Control IBM Spectrum Protect IBM Spectrum Virtualize IBM Spectrum Archive IBM Spectrum Accelerate IBM Spectrum Scale IBM Cloud Object IBM Spectrum CDM Hybrid cloud storage and data management that helps optimize applications and reduce costs by up to 73% Optimized hybrid cloud data protection that can simplify restores and reduce backup costs by up to 53 percent Virtualization and optimization of of hybrid cloud block environments that helps improve flexibility and stores up to 5x more data Long term retention for active archive data that lowers costs up to 90% by delivering a fast tape file system Highly flexible, scale-out enterprise block storage for hybrid clouds that deploys in minutes High-performance, highly scalable hybrid cloud storage for unstructured data Flexible and economical scalable hybrid cloud object storage with geo-dispersed enterprise availability and security Simplified copy data management that can increase business velocity and efficiency Family of Management and Optimization Software Private, Public or Hybrid Cloud Any Flash Cloud Services Rich Servers Secure Efficient High- Hybrid Performance Cloud