Technology Challenges for Clouds. Henry Newman CTO Instrumental Inc

Similar documents
I/O Challenges: Todays I/O Challenges for Big Data Analysis. Henry Newman CEO/CTO Instrumental, Inc. April 30, 2013

Coming Changes in Storage Technology. Be Ready or Be Left Behind

Sharing Digital Content with MassTransit HP 7.1

Deep Storage for Exponential Data. Nathan Thompson CEO, Spectra Logic

NetApp: Solving I/O Challenges. Jeff Baxter February 2013

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

Dell EMC Isilon All-Flash

The Fastest And Most Efficient Block Storage Software (SDS)

Architecting Storage for Semiconductor Design: Manufacturing Preparation

Refining and redefining HPC storage

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Creating the Fastest Possible Backups Using VMware Consolidated Backup. A Design Blueprint

Web Object Scaler. WOS and IRODS Data Grid Dave Fellinger

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Extremely Fast Distributed Storage for Cloud Service Providers

The World s Fastest Backup Systems

Open storage architecture for private Oracle database clouds

Introducing Panasas ActiveStor 14

Flash Decisions: Which Solution is Right for You?

Aerospike Scales with Google Cloud Platform

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

NVMe Takes It All, SCSI Has To Fall. Brave New Storage World. Lugano April Alexander Ruebensaal

XtreemStore A SCALABLE STORAGE MANAGEMENT SOFTWARE WITHOUT LIMITS YOUR DATA. YOUR CONTROL

Distributed File Systems II

Introduction. Executive Summary. Test Highlights

High performance Ethernet networking MEW24. High performance Ethernet networking

MODERNISE WITH ALL-FLASH. Intel Inside. Powerful Data Centre Outside.

Veritas NetBackup on Cisco UCS S3260 Storage Server

Sun Lustre Storage System Simplifying and Accelerating Lustre Deployments

Low-cost BYO Mass Storage Project

Nimble Storage vs HPE 3PAR: A Comparison Snapshot

Network Design Considerations for Grid Computing

ULTIMATE ONLINE STORAGE TIER 0 PRODUCTS

Xyratex ClusterStor6000 & OneStor

Microsemi Adaptec Trusted Storage Solutions. A complete portfolio of 12 Gbps Host Bus Adapters, RAID Adapters, SAS Expander and Cables

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Micron NVMe in the Open19 Platform: Low-Latency Storage at the Edge.

Investing in a Better Storage Environment:

ARISTA: Improving Application Performance While Reducing Complexity

An Oracle White Paper December Accelerating Deployment of Virtualized Infrastructures with the Oracle VM Blade Cluster Reference Configuration

Designing Next Generation FS for NVMe and NVMe-oF

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

VMware Virtual SAN Technology

ULTIMATE ONLINE STORAGE TIER 0 PRODUCTS. GB Labs Tier 0 brochure 2016.indd 1 05/09/ :07:45

MongoDB on Kaminario K2

The Leading Parallel Cluster File System

Milestone Certified Solution

Improved Solutions for I/O Provisioning and Application Acceleration

Flash Storage with 24G SAS Leads the Way in Crunching Big Data

Lightspeed. Solid. Impressive. Nytro 3000 SAS SSD

White paper Version 3.10

RAIDIX Data Storage Solution. Clustered Data Storage Based on the RAIDIX Software and GPFS File System

Mission Impossible: Surviving Today s Flood of Critical Data

High-Performance Lustre with Maximum Data Assurance

VxRail: Level Up with New Capabilities and Powers GLOBAL SPONSORS

A High-Performance Storage and Ultra- High-Speed File Transfer Solution for Collaborative Life Sciences Research

Top-Down Network Design

Storage Adapter Testing Report

HPC Storage Use Cases & Future Trends

Hitachi Adaptable Modular Storage and Hitachi Workgroup Modular Storage

The current status of the adoption of ZFS* as backend file system for Lustre*: an early evaluation

SoftNAS Cloud Performance Evaluation on AWS

Benchmarking Enterprise SSDs

Trends in Worldwide Media and Entertainment Storage

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

IBM Spectrum NAS. Easy-to-manage software-defined file storage for the enterprise. Overview. Highlights

Best Practices for Deploying a Mixed 1Gb/10Gb Ethernet SAN using Dell EqualLogic Storage Arrays

Specifying Storage Servers for IP security applications

Innovations in Non-Volatile Memory 3D NAND and its Implications May 2016 Rob Peglar, VP Advanced Storage,

LustreFS and its ongoing Evolution for High Performance Computing and Data Analysis Solutions

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

November 7, DAN WILSON Global Operations Architecture, Concur. OpenStack Summit Hong Kong JOE ARNOLD

Anil Vasudeva IMEX Research. (408)

Media-Ready Network Transcript

Next Generation Storage for The Software-Defned World

HPE ProLiant ML350 Gen P 16GB-R E208i-a 8SFF 1x800W RPS Solution Server (P04674-S01)

2013 AWS Worldwide Public Sector Summit Washington, D.C.

<Insert Picture Here> Oracle Storage

朱义普. Resolving High Performance Computing and Big Data Application Bottlenecks with Application-Defined Flash Acceleration. Director, North Asia, HPC

Hitachi Adaptable Modular Storage and Workgroup Modular Storage

product overview CRASH

SONAS Performance: SPECsfs benchmark publication

PMC Adaptec Trusted Storage Solutions. A complete portfolio of 12Gb/s Host Bus Adapters, RAID Adapters, SAS Expander and Cables

Intel Solid State Drive Data Center Family for PCIe* in Baidu s Data Center Environment

SurFS Product Description

CISCO HYPERFLEX SYSTEMS FROM KEYINFO. Bring the potential of hyperconvergence to a wide range of workloads and use cases

Flash In the Data Center

NVMe Direct. Next-Generation Offload Technology. White Paper

File System Structure. Kevin Webb Swarthmore College March 29, 2018

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

Tech Talk on HPC. Ken Claffey. VP, Cloud Systems. May 2016

SSD Telco/MSO Case Studies

Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c

DELL EMC DATA DOMAIN SISL SCALING ARCHITECTURE

Infortrend Solution. Media & Entertainment. Feb Version: 1.5

Operating in a Flash: Microsoft Windows Server Hyper-V

No More Waiting Around

Hosting your success. Best in class cloud services, to keep your business ahead

Transcription:

Technology Challenges for Clouds Henry Newman CTO Instrumental Inc hsn@instrumental.com

I/O Modeling Plan: What is the Weather March 31, 2010 Henry Newman Instrumental Inc/CTO hsn@instrumental.com

Agenda Who is Instrumental Challenges for Cloud architectures Challenges for Cloud customers Cloud storage technology review Next steps Slide 3 of 25 Copyright 2010, Instrumental, Inc.

Who is Instrumental Founded in 1991 Headquarters in Bloomington, MN Office in Reston, VA Core team of professionals System Analysts and Architects Broad experiences with challenging problems Focused on technology Performance-based Professional Services HPC computation, storage & networking UNIX, Linux and Windows Slide 4 of 25 Copyright 2010, Instrumental, Inc.

Professional Services Instrumental delivers performance Specializing in helping customers achieve a sustainable, competitive advantage Our Mission - To deliver excellence and innovation in advanced computing environments With proven capabilities Computation, storage, networking and usability Architecture, design and integration services Performance management System analysis and optimization Modeling and simulation Slide 5 of 25 Copyright 2010, Instrumental, Inc.

Instrumental Expertise Complex Challenges Exploding costs & tighter budgets Liability and reputation issues Rigid Requirements Not always performance based Control and usability Scalable Solutions CPUs, memory, I/O and networking 10,000s of processors 100s of TB/sec for the file system 10s of Petabytes in the file system and HSM Slide 6 of 25 Copyright 2010, Instrumental, Inc.

Vendor Neutral We do not resell anything Hardware or Software Nor are we compensated for recommendations No finder s fees No commissions Direct support of many agencies Trusted partner Vendors respect us and sometimes fear us Support for Prime contractors Sometimes we are asked to be the integrators for complex architectures Slide 7 of 25 Copyright 2010, Instrumental, Inc.

Challenges for Cloud architectures Things often not considered

Obvious Issues Security This is considered the biggest challenge This is at best a very difficult problem to solve given current state of security stack Desire of hostiles to gain access QoS No end-to-end way to provide consistent services No planned changes in POSIX to allow this Slide 9 of 25

Non-Obvious Issues: Cost Network bandwidth costs OC-192 is equal to about a single 10 GbE Cost is well over $100K for the line without hardware A server 2 port 10 GbE NIC is $350 10 GbE switch ports are far less than OC-192 network gear Fast network bandwidth is needed for many access and replication tasks Slide 10 of 25

Non-Obvious Issues: Latency Famous quote from John Mashey Money can buy you bandwidth but latency is forever Protocols for file systems do not address the latency problem Applications need to be written to address this Estimate latency from NY to CA is about 35 milliseconds This is ~7x fastest spinning disk s latency Latency will have an impact on: Small files and small I/O requests File and directory lookups Slide 11 of 25

Challenges for Cloud Customers Technology Challenges

Some of the challenges Streaming performance impact Local disks and networks vs. remote disks and networks Latency impact Likely 10x different given switching and protocols Protocols for multi-user access to same file POSIX locks file to prevent out of order writes some remote protocols bypass POSIX NFSv3 for example Slide 13 of 25

Cloud storage technology review Replication for large storage systems will be costly

Impact of Disk Technology on Clouds Next few slides step through industry published data on disk technology and the impact on cloud architectures First some definitions Bit Error Rate (BER) In telecommunication transmission, the bit error rate (BER) is the percentage of bits that have errors relative to the total number of bits received in a transmission, usually expressed as ten to a negative power. For example, a transmission might have a BER of 10 to the minus 6, meaning that, out of 1,000,000 bits transmitted, one bit was in error. The BER is an indication of how often a packet or other data unit has to be retransmitted because of an error Slide 15 of 25

More Definitions Hard Error An unrecoverable error occurs when, for example, a disk encounters an area of the media that is damaged and is no longer readable. Several retries are typically performed by the read logic until a soft error threshold is exceeded. Disk drives specify this in terms of bits read per sector that cannot be read. Slide 16 of 25

Hard Error Rate for Disks Drive Type One sector per X bits=hard error rate Byte Equivalent PByte Equivalent Consumer SATA 10E14 1.25E+14 0.11 Enterprise SATA 10E15 1.25E+15 1.11 Enterprise SAS 10E16 1.25E+16 11.10 By reading ~110 TB of data on consumer drives you can expect 1 sector that cannot be read Currently, for all vendors, this will cause the drive to fail For disk drives hard errors are a fact of life always have been and always will be Slide 17 of 25

Time to Read/Write a Drive Time to read estimate in seconds Time to read 2 TB Drive in seconds Consumer SATA 24390.24 Enterprise SATA 21762.79 This time grows with each new drive generation given increased density Performance increases at most 20% while density often increases 2x This will become untenable with 4 TB drives taking over 10 hours to rebuild (consumer drive) Slide 18 of 25

Network Bandwidth OC Channel Speed Estimated MB/sec Number of Consumer SATA Drives in bandwidth Number of Enterprise SATA Drives in bandwidth 48 276 3.37 3.04 192 1106 13.49 12.15 384 4424 53.95 48.61 768 17695 215.79 194.45 How many drives are needed to saturate a channel in terms of bandwidth which is needed for replication Does not include data access OC-768 is the internet backbone OC-192 POPs are few and far between even today Slide 19 of 25

Annualized Failure Rate AFR in % 1 PB in Drives 1 PB Failure Rate 5 PB in Drives 5 PB Failure Rate 10 PB in Drives 10 PB Failure Rate 25 PB in Drives 25 PB Failure Rate Number of Consumer SATA Drives Number of Enterprise SATA 1.24% 500 0.02 2500 0.08 5000 0.17 12500 0.42 0.73% 500 0.01 2500 0.05 5000 0.10 12500 0.25 Vendor published numbers on best case expected number of drives to fail per day Lots of factors that make this far worse Google and CMU study shows far greater This is best case vendor data across multiple drive vendors Slide 20 of 25

BER Failures 5% Number of Consumer SATA Drives Number of Enterprise SATA Drives 7.50% Number of Consumer SATA Drives Number of Enterprise SATA Drives 1 PB 5 PB 10 PB 25 PB 1.5 7.5 15.0 37.6 0.2 0.9 1.8 4.4 1 PB 5 PB 10 PB 25 PB 2.2 11.2 22.5 56.1 0.3 1.3 2.6 6.5 This chart shows daily failure rate using 5% of drive bandwidth and 7.5% of theoretical drive bandwidth Failures because of BER depend on storage usage based on statistical analysis based on ECC failures within the drive Slide 21 of 25

Replication time for failed drives MB Per Day of rebuild 5% usage Consumer SATA Drives MB Per Day of rebuild 7.5% usage Consumer SATA Drives 1 PB 10 PB 25 PB 50 PB 3,005,553 30,055,531 75,138,828 150,277,655 4,047,145 40,471,446 101,178,614 202,357,228 This is the number of Mbyte seconds required to replicate fail drives How many second per day running at 1 MB/sec If this value- minus the number of MB/sec of the OC channel*24*3600 is positive the time to replicate is less than channel bandwidth Slide 22 of 25

Implications Drive Type and Usage Consumer SATA 5% Consumer SATA 7.5% Consumer SATA 5% Consumer SATA 7.5% OC Channel Speed 1 PB 5 PB 10 PB 25 PB 48 20,882,319 8,860,106-6,167,659-51,250,956 48 19,840,727 3,652,149-16,583,574-77,290,742 192 92,545,935 80,523,722 65,495,957 20,412,660 192 91,504,343 75,315,765 55,080,042-5,627,126 At 10 PB and 5% drive utilization failure rate exceeds an OC-48 channel 71.4 MB/sec more is needed Drive failures will eventually cause the loss of data This does not take into account two drives with same data on it failing It will happen Slide 23 of 25

Next steps What needs to be considered

What to do about drive rebuild times In some cases with large data throughput, moving to the cloud is too costly Data loss will increase with each new drive generation Storage Class Memory (SSDs) will probably not help Slide 25 of 25

Thank You