Storage for HPC, HPDA and Machine Learning (ML)

Similar documents
IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

Software Defined Storage for the Evolving Data Center

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Accelerate with IBM Storage: IBM FlashSystem A9000/A9000R and Hyper-Scale Manager (HSM) 5.1 update

IBM TS4300 with IBM Spectrum Storage - The Perfect Match -

Data Analytics and Storage System (DASS) Mixing POSIX and Hadoop Architectures. 13 November 2016

Next Generation Storage for The Software-Defned World

IBM Storage, le soluzioni efficienti senza compromessi

Modernize Your Storage

OpenStack SwiftOnFile: User Identity for Cross Protocol Access Demystified Dean Hildebrand, Sasikanth Eda Sandeep Patil, Bill Owen IBM

An introduction to IBM Spectrum Scale

Warsaw. 11 th September 2018

THE EMC ISILON STORY. Big Data In The Enterprise. Deya Bassiouni Isilon Regional Sales Manager Emerging Africa, Egypt & Lebanon.

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

São Paulo. August,

DELL EMC ISILON ONEFS OPERATING SYSTEM

Hybrid Cloud NAS for On-Premise and In-Cloud File Services with Panzura and Google Cloud Storage

Data Movement & Tiering with DMF 7

Software Defined Storage

PracticeTorrent. Latest study torrent with verified answers will facilitate your actual test

Introduction to Digital Archiving and IBM archive storage options

Renovating your storage infrastructure for Cloud era

BIG DATA READY WITH ISILON JEUDI 19 NOVEMBRE Bertrand OUNANIAN: Advisory System Engineer

High performance and functionality

Object storage platform How it can help? Martin Lenk, Specialist Senior Systems Engineer Unstructured Data Solution, Dell EMC

EMC STORAGE STRATEGY. Copyright 2015 EMC Corporation. All rights reserved.

Ta kontroll över er data! Christofer Jensen Client Technical Specialist. Stockholm

IBM Storwize V7000, Storwize V5000 and IBM Storwize V5000F

BUSINESS DATA LAKE FADI FAKHOURI, SR. SYSTEMS ENGINEER, ISILON SPECIALIST. Copyright 2016 EMC Corporation. All rights reserved.

A product by CloudFounders. Wim Provoost Open vstorage

Introducing SUSE Enterprise Storage 5

Emerging Technologies for HPC Storage

The Latest EMC s announcements

IBM Storwize V7000 Unified

Scale-out Object Store for PB/hr Backups and Long Term Archive April 24, 2014

Dell EMC Surveillance for Reveal Body- Worn Camera Systems

Dell EMC Surveillance for IndigoVision Body-Worn Cameras

DELL EMC ISILON ONEFS OPERATING SYSTEM

Modernize Your Backup and DR Using Actifio in AWS

Disruptive Forces Affecting the Future

IBM HPC/HPDA/AI Solutions Albert Valls Badia IBM Client Technical Architect IBM Systems Hardware

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY

IBM Storage Solutions & Software Defined Infrastructure

REFERENCE ARCHITECTURE Quantum StorNext and Cloudian HyperStore

DELL EMC ISILON SCALE-OUT NAS PRODUCT FAMILY Unstructured data storage made simple

FLASHARRAY//M Smart Storage for Cloud IT

Copyright 2012 EMC Corporation. All rights reserved.

Quobyte The Data Center File System QUOBYTE INC.

Dell EMC Surveillance for VIEVU Body- Worn Cameras

Why software defined storage matters? Sergey Goncharov Solution Architect, Red Hat

IBM DeepFlash Elastic Storage Server

AWS Storage Gateway. Amazon S3. Amazon EFS. Amazon Glacier. Amazon EBS. Amazon EC2 Instance. storage. File Block Object. Hybrid integrated.

EMC Solution for VIEVU Body Worn Cameras

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

StrongLink: Data and Storage Management Simplified

IBM Spectrum Scale in an OpenStack Environment

Backup and archiving need not to create headaches new pain relievers are around

TECHNICAL OVERVIEW OF NEW AND IMPROVED FEATURES OF EMC ISILON ONEFS 7.1.1

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

EMC Surveillance for Edesix Body- Worn Cameras

Microsoft SharePoint data management solution using IBM Spectrum Scale and AvePoint DocAve

Scality RING on Cisco UCS: Store File, Object, and OpenStack Data at Scale

Transforming Traditional IT to a Multi-Cloud Strategy Leveraging IBM Storage Unique Offerings

Effizientes Speichern von Cold-Data

GlusterFS Architecture & Roadmap

Next Gen Storage StoreVirtual Alex Wilson Solutions Architect

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

Next Generation Data Center : Future Trends and Technologies

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Deploying Software Defined Storage for the Enterprise with Ceph. PRESENTATION TITLE GOES HERE Paul von Stamwitz Fujitsu

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

Commvault Backup to Cloudian Hyperstore CONFIGURATION GUIDE TO USE HYPERSTORE AS A STORAGE LIBRARY

Dell Fluid Data solutions. Powerful self-optimized enterprise storage. Dell Compellent Storage Center: Designed for business results

powered by Cloudian and Veritas

VxRail: Level Up with New Capabilities and Powers GLOBAL SPONSORS

JEUDI 19 NOVEMBRE 2015

HCI: Hyper-Converged Infrastructure

Data Protection Modernization: Meeting the Challenges of a Changing IT Landscape

Virtual Desktop Infrastructure (VDI) Bassam Jbara

AWS Storage Gateway. Not your father s hybrid storage. University of Arizona IT Summit October 23, Jay Vagalatos, AWS Solutions Architect

HPE Storage news. Mauro Colombo Hybrid IT sales & presales manager 23 rd May 2018

An ESS implementation in a Tier 1 HPC Centre

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

SOLUTION BRIEF Fulfill the promise of the cloud

EMC Forum EMC ViPR and ECS: A Lap Around Software-Defined Services

DISK LIBRARY FOR MAINFRAME

EMC Forum 2014 EMC ViPR and ECS: A Lap Around Software-Defined Services. Magnus Nilsson Blog: purevirtual.

IBM Spectrum Protect Plus

Integrated and Hyper-converged Data Protection

Balakrishnan Nair. Senior Technology Consultant Back Up & Recovery Systems South Gulf. Copyright 2011 EMC Corporation. All rights reserved.

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Composable Infrastructure for Technical Computing

JOURNEY TO YOUR CLOUD. Mika Kotro Sales Development EMC Deutschland GmbH. Copyright 2012 EMC Corporation. All rights reserved.

IT Certification Exams Provider! Weofferfreeupdateserviceforoneyear! h ps://

SnapServer Family (XSD & XSR)

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

VMworld 2013 Overview

Executive Summary SOLE SOURCE JUSTIFICATION. Microsoft Integration

Transcription:

for HPC, HPDA and Machine Learning (ML) Frank Kraemer, IBM Systems Architect mailto:kraemerf@de.ibm.com

IBM Data Management for Autonomous Driving (AD) significantly increase development efficiency by reducing manual efforts for video tagging, eliminated wasted time for data search and manual data copy/move processes and by automating workflows significantly increase test through-put, allowing you to run more test cases in less time, therefore increasing time-tomarket as well as the quality of your camera and ADAS products to reduce IT costs for local storage hardware by globally centralizing data increase the entire flexibility through the ability to move work-load from one place to another guarantee long-year data verifiability and recoverability of test data via archiving

DESY High Performance Computing with Data

Introducing IBM Spectrum Scale Highly scalable high-performance unified storage for files and objects with integrated analytics Remove data-related bottlenecks Demonstrated 400 GB/s throughput, building to 2.5TB/s Local caching for Read and Write Enable global collaboration Data Lake serving HDFS, files & object across sites Multi-cluster configurations; Sync & Async Optimize cost and performance Up to 90% cost savings & 6x flash acceleration Transparently Tier to Cloud Ensure data availability, integrity and security End-to-end checksum, Spectrum Scale RAID, NIST/FIPS certification Compression, Encryption, Audit Logging

The History of Spectrum Scale * Gartner, Magic Quadrant for Distributed File Systems and Object, 20 October 2016, Document No. G00307798

IBM for unstructured data Infrastructure requirement Scalability Flexibility Agility New gen workloads Performance Cloud Object Global IBM answer Parallel File System Software Defined Unified HDFS connector Parallel File system OpenStack integration & Transparent Cloud Tiering Unified AFM for multi-cluster storage

Spectrum Scale: The flexible cognitive Solution HPC & AI Client workstations Big Data Analytics Compute farm Users and applications Global name space OpenStack Container POSIX NFS SMB/CIFS Object HDFS Controller Cinder Glance Swift Manila Docker Kubernetes Site B Site A IBM Spectrum Scale Automated data placement and data migration On/Off Premise Site C Flash Disk Tape Rich Servers Transparent Cloud Tiering Cloud Data Sharing Users and applications

IBM Spectrum Scale performance features Performance Leadership for large and small files text Highly Available Write Cache (HAWC) Improves performance of small synchronous writes Small synch writes are written to the log. As log fills, rewrite to home Local Read Only Cache (LROC) Extend the page pool memory to include local DAS/SSD for read caching Compression Compress what makes sense & extends to cache Quality of Service Throttle background functions such as rebuild or async replication Set by flexible policy, such as day-of-week and time-of-day Distributed and flash accelerated metadata Metadata includes directories, inodes, indirect blocks Lift data to the highest tiers based on the file s heat Automate workload pipelines with Spectrum Computing LSF

Advanced File Management (AFM) Tie together multiple clusters to serve users across the globe Spans geographic distance and unreliable networks Caches local copies of data distributed to one or more Spectrum Scale clusters Low latency local read and write performance As data is written or modified at one location, all other locations see that same data Efficient data transfers over wide area network (WAN) Speeds data access to collaborators and resources around the world Unifies heterogeneous remote storage Asynchronous DR is a special case of AFM Bidirectional awareness for Fail-over & Fail-back with data integrity Recovery Point Objectives for volume & application consistency

HPC Performance with simplified user access Transparent Tiering & Data Migration Analyze and Archive In-Place Enterprise HPC with Flash for performance Network Shared Disk for modular scaling Tier data based upon policy, users actions or workflow Lower economics with tape, object, or cloud 2 nd Site Data always available to end-users Auto-migrate to higher tiers Full data, not stubs Global namespace extends across physical storage and multiple sites NFS/SMB Cluster System pool (Flash) Gold pool (Disk) General High capacity tiered NAS with fast data ingest/retention/share and long term retention Deployed today in multiple clients 1 0 Tape Library

Unified File & Object + HDFS Store everywhere. Run anywhere. text Challenge: Object storage for data & cloud Seamless scaling RESTful data access Object metadata replaces hierarchy IBM Spectrum Scale Swift & S3 High-performance for object Native OpenStack Swift support w/ S3 File or object in; Object or file out Enterprise data protection & Features Full OpenStack cloud support Cinder (block), Manilla (file), Swift (object)

Analytics without complexity Store everywhere. Run anywhere. text Challenge: Separate storage systems for ingest, analysis, results HDFS requires locality aware storage (namenode) Data transfer slows time to results Different frameworks & analytics tools use data differently Raw Data Ingest Analysis HDFS Transparency Map/Reduce on shared, or shared nothing storage No waiting for data transfer between storage systems Immediately share results Single Data Lake for all applications Enterprise data management Archive and Analysis in-place Direct Access File Object POSIX

Comparing HDFS v. IBM Spectrum Scale Preliminary Results Presented SC16 U/G 20 nodes (compute & storage) Cloudera Map/Reduce Compute the average temperature for every grid point (x, y, and z) Vary by the total number of years MERRA Monthly Means (Reanalysis) Comparison of serial c-code to MapReduce code Comparison of traditional HDFS (Hadoop) where data is sequenced (modified) with GPFS where data is native NetCDF (unmodified, copy) Using unmodified data in GPFS with MapReduce is the fastest Only showing GPFS results to compare against HDFS DASS Initial Serial Performance http://files.gpfsug.org/presentations/2016/sc16/06_-_carrie_spear_-_spectrum_sclale_and_hdfs.pdf

IBM Spectrum Scale: Transparent Cloud Tiering Single namespace and control of data placement for hybrid cloud text Intelligent data placement On or off-premises objects Policy driven tiering Managed data placement or migration of cold data Automated data movement Recall on user demand IBM Spectrum Scale High-performance Single namespace Unified file, object and HDFS Encrypted Secure data in cloud

IBM Spectrum Scale: Cloud Data Sharing Policy-driven data movement for hybrid cloud text Managed data sharing Policy driven replication and synchronization Granular control: Type, action, metadata or heat Bridging cloud and file -to-storage Data and metadata Automated data movement Secure, reliable connection High-speed and scalable Clustered configurations IBM Spectrum Scale High-performance file, object and HDFS Clustered, tiered and scalable Bridge legacy applications and new workloads Cloud storage Cloud native applications Dev/Ops development New workloads

IBM Spectrum Scale Features and Benefits management at scale Store everywhere. Run anywhere. Improve data economics Software Defined Open Platform Simplified, self-tuning options New GUI & health monitoring Unified File, Object & HDFS Distributed metadata & high-speed scanning QoS management 1 Billion Files & yottabytes of data Multi-cluster and system management integration with IBM Spectrum Control Advanced routing with latency awareness Read or Write Caching Active File Management for WAN deployments File Placement Optimization End-to-end data integrity Snapshots Sync or Async DR zlinux support Tier seamlessly Incorporate and share flash Policy driven compression Data protection with erasure code and replication Native Encryption and Secure Erase compliance Target object store and cloud Leading performance for Backup and Archive Heterogeneous commodity storage: Flash, disk & tape Software, appliance or Cloud Data driven migration to practically any target File/Object In/Out with OpenStack SWIFT & S3 Transparent native HDFS Integration with cloud

New Generation Performance and Capacity 25 GB/s 36 GB/s New! Model GL6S: 6 Enclosures, 34U 502 NL-SAS, 2 SSD Spectrum Scale Announced on April 11, 2017 New all Flash options in Q3 17 GB/s 24 GB/s New! Model GL4S: 4 Enclosures, 24U 334 NL-SAS, 2 SSD Model GL6: 6 Enclosures, 28U 348 NL-SAS, 2 SSD 8 GB/s Model GL2: 2 Enclosures, 12U 116 NL-SAS, 2 SSD 12 GB/s New! Model GL2S: 2 Enclosures, 14U 166 NL-SAS, 2 SSD Model GL4: 4 Enclosures, 20U 232 NL-SAS, 2 SSD Max:.9PB raw Max: 1.6PB raw Max: 1.8PB raw Max: 3.3PB raw Max: 2.8PB raw Max: 5PB raw 17

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 9 16 17 System x3650 M4 System x3650 M4 EXP3524 8 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 9 16 17 16 17 System x3650 M4 System x3650 M4 EXP3524 EXP3524 8 9 8 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 8 9 8 9 16 17 16 17 16 17 16 17 System x3650 M4 System x3650 M4 EXP3524 EXP3524 EXP3524 EXP3524 All Flash : All Flash Performance *NEW* Model GS1S 24 SSD All Flash Speed Model GS2S 48 SSD Drives Model GS4S 96 SSD Drives Capacity calculations are based on 15.36 TB SSD s Performance Numbers are based on 4MB blocksize and 100% Reads without any cache hits. Writes are typically 20% less than Read Performance 368 Raw TB 14 GB/sec 736 Raw TB 26 GB/sec 1472 Raw TB 40 GB/sec All numbers are based on 100 Gb EDR (4 ports connected per Node) 18

IBM Software-Defined Portfolio IBM Spectrum Control IBM Spectrum Protect IBM Spectrum Virtualize IBM Spectrum Archive IBM Spectrum Accelerate IBM Spectrum Scale IBM Cloud Object IBM Spectrum CDM Hybrid cloud storage and data management that helps optimize applications and reduce costs by up to 73% Optimized hybrid cloud data protection that can simplify restores and reduce backup costs by up to 53 percent Virtualization and optimization of of hybrid cloud block environments that helps improve flexibility and stores up to 5x more data Long term retention for active archive data that lowers costs up to 90% by delivering a fast tape file system Highly flexible, scale-out enterprise block storage for hybrid clouds that deploys in minutes High-performance, highly scalable hybrid cloud storage for unstructured data Flexible and economical scalable hybrid cloud object storage with geo-dispersed enterprise availability and security Simplified copy data management that can increase business velocity and efficiency Family of Management and Optimization Software Private, Public or Hybrid Cloud Any Flash Cloud Services Rich Servers Secure Efficient High- Hybrid Performance Cloud