Why Scale-Out Big Data Apps Need A New Scale- Out Storage

Similar documents
Hedvig as backup target for Veeam

StorMagic SvSAN 6.1. Product Announcement Webinar and Live Demonstration. Mark Christie Senior Systems Engineer

HPE Synergy HPE SimpliVity 380

Introduction to Atlantis HyperScale

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

ECONOMICAL, STORAGE PURPOSE-BUILT FOR THE EMERGING DATA CENTERS. By George Crump

HPE SimpliVity 380. Simplyfying Hybrid IT with HPE Wolfgang Privas Storage Category Manager

New HPE 3PAR StoreServ 8000 and series Optimized for Flash

Renovating your storage infrastructure for Cloud era

Cisco HyperConverged Infrastructure

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

Introducing HPE SimpliVity 380

How To Get The Most Out Of Flash Deployments

Copyright 2012 EMC Corporation. All rights reserved.

Warsaw. 11 th September 2018

On-Premises Cloud Platform. Bringing the public cloud, on-premises

SOFTWARE DEFINED STORAGE

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Installation runbook for Hedvig + Cinder Driver

VMware Virtual SAN. Technical Walkthrough. Massimiliano Moschini Brand Specialist VCI - vexpert VMware Inc. All rights reserved.

SOLUTION BRIEF Fulfill the promise of the cloud

Delivering HCI with VMware vsan and Cisco UCS

EMC & VMWARE STRATEGIC FORUM NEW YORK MARCH David Goulden President & COO. Copyright 2013 EMC Corporation. All rights reserved.

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

VMWARE VSAN LICENSING GUIDE - MARCH 2018 VMWARE VSAN 6.6. Licensing Guide

VMware Virtual SAN. High Performance Scalable Storage Architecture VMware Inc. All rights reserved.

VMware Virtual SAN Technology

Soluzioni integrate con vsphere La virtualizzazione abilita il percorso evolutivo di innovazione dell'it

Hitachi Virtual Storage Platform Family

VMware vsan 6.6. Licensing Guide. Revised May 2017

Springpath Data Platform

Software Defined Storage

Flashed-Optimized VPSA. Always Aligned with your Changing World

Solution Brief: Commvault HyperScale Software

Cloud Meets Big Data For VMware Environments

The storage challenges of virtualized environments

Workspace & Storage Infrastructure for Service Providers

Software Defined Storage for the Evolving Data Center

SolidFire and Ceph Architectural Comparison

Discover the all-flash storage company for the on-demand world

PRESENTATION TITLE GOES HERE

DATACENTER AS A SERVICE. We unburden you at the level you desire

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

The next step in Software-Defined Storage with Virtual SAN

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

Scale-Out Architectures for Secondary Storage

Software Defined Storage. Reality or BS?

Disruptive Forces Affecting the Future

Ten things hyperconvergence can do for you

Why Datrium DVX is Best for VDI

Modernize Your Backup and DR Using Actifio in AWS

The Fastest And Most Efficient Block Storage Software (SDS)

Nutanix White Paper. Hyper-Converged Infrastructure for Enterprise Applications. Version 1.0 March Enterprise Applications on Nutanix

Scalable backup and recovery for modern applications and NoSQL databases. Best practices for cloud-native applications and NoSQL databases on AWS

Storage for HPC, HPDA and Machine Learning (ML)

Converged Platforms and Solutions. Business Update and Portfolio Overview

Dell EMC Hyperconverged Portfolio: Solutions that Cover the Use Case Spectrum

EMC STORAGE STRATEGY. Copyright 2015 EMC Corporation. All rights reserved.

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

São Paulo. August,

A product by CloudFounders. Wim Provoost Open vstorage

"Software-defined storage Crossing the right bridge"

Cloud Storage. Patrick Osborne Director of Product Management. Sam Fineberg Distinguished Technologist.

Reasons to Deploy Oracle on EMC Symmetrix VMAX

Copyright 2012, Oracle and/or its affiliates. All rights reserved.

Copyright 2012 EMC Corporation. All rights reserved.

ELASTIC DATA PLATFORM

VMworld 2018 Content: Not for publication or distribution

Next Generation Storage for The Software-Defned World

SolidFire. Petr Slačík Systems Engineer NetApp NetApp, Inc. All rights reserved.

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

Increasing Performance of Existing Oracle RAC up to 10X

SoftNAS Cloud Data Management Products for AWS Add Breakthrough NAS Performance, Protection, Flexibility

Why software defined storage matters? Sergey Goncharov Solution Architect, Red Hat

The Future of Business Depends on Software Defined Storage (SDS) How SSDs can fit into and accelerate an SDS strategy

THE FUTURE OF BUSINESS DEPENDS ON SOFTWARE DEFINED STORAGE (SDS)

Copyright 2010 EMC Corporation. Do not Copy - All Rights Reserved.

VEXATA FOR ORACLE. Digital Business Demands Performance and Scale. Solution Brief

Offloaded Data Transfers (ODX) Virtual Fibre Channel for Hyper-V. Application storage support through SMB 3.0. Storage Spaces

IBM Spectrum NAS, IBM Spectrum Scale and IBM Cloud Object Storage

50 TB. Traditional Storage + Data Protection Architecture. StorSimple Cloud-integrated Storage. Traditional CapEx: $375K Support: $75K per Year

Microsoft SQL Server HA and DR with DVX

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE

Storage Strategies for vsphere 5.5 users

Scale-out Storage Solution and Challenges Mahadev Gaonkar igate

IOPStor: Storage Made Easy. Key Business Features. Key Business Solutions. IOPStor IOP5BI50T Network Attached Storage (NAS) Page 1 of 5

Verron Martina vspecialist. Copyright 2012 EMC Corporation. All rights reserved.

Life In The Flash Director - EMC Flash Strategy (Cross BU)

Webinar Series: Triangulate your Storage Architecture with SvSAN Caching. Luke Pruen Technical Services Director

Power of the Portfolio. Copyright 2012 EMC Corporation. All rights reserved.

Lenovo Software Defined Infrastructure Solutions. Aleš Simončič Technical Sales Manager, Lenovo South East Europe

TITLE. the IT Landscape

Software Defined Storage

Aerospike Scales with Google Cloud Platform

Take control of storage performance

Vurtualised Storage for VMware Virtual Infrastructures. Presented by: Rob Waite Consulting Systems Engineer, NetApp

3/26/2018. Hyperconvergence. CreekPointe, Inc. Introductions Hyperconvergance Defined Advantages Use Cases Q&A Close. Mike Clarke, CreekPointe Inc.

Next Gen Storage StoreVirtual Alex Wilson Solutions Architect

5 Fundamental Strategies for Building a Data-centered Data Center

Transcription:

Why Scale-Out Big Data Apps Need A New Scale- Out Storage Modern storage for modern business Rob Whiteley, VP, Marketing, Hedvig April 9, 2015

Big data pressures on storage infrastructure The rise of elastic software-defined storage (SDS) Agenda 6 SDS capabilities for big data 3 cases studies of SDS for big data Copyright 2015 Hedvig Inc. Confidential.

Big data pressures on storage infrastructure

Big data requires flexible infrastructure Big data Time-tomarket Flexible infrastructure Business executives Developers IT infrastructure & DevOps 4 Copyright 2015 Hedvig Inc. Confidential.

According to Forrester... 10X faster growth of enterprise data than storage budgets 58% of orgs take days, weeks, or months to provision storage 14% of orgs have cloud-like provisioning capabilities Source: Forrester Technology Adoption Profile: Meet Evolving Business Demands With Software-Defined Storage, March 2015. Visit hedviginc.com for full research report. 5 Copyright 2015 Hedvig Inc. Confidential.

Three truths and a lie about storage & big data Software-defined storage is the right direction. Hyperconverged provides the best economics. Big data apps are repeating the sins of the 90s. Hyperscale helps virtualize Hadoop and NoSQL. 6 Copyright 2015 Hedvig Inc. Confidential.

The rise of elastic softwaredefined storage (SDS)

A big data inflection point in storage Storage capabilities Before Hardware-defined Scale-out High-availability + RAID Hyperconverged After Software-defined Elastic Distributed + Replication Hyperconverged + Hyperscale The big data so-ware storage inflec3on point Traditional Scale-up Scale-out Elastic Price/ performance 8 Copyright 2015 Hedvig Inc. Confidential.

Three legs to the big data requirements stool Virtual SANs Deployment flexibility Software-defined storage Storage flavors Storage features Monolithic arrays Hyperconverged 9 Copyright 2015 Hedvig Inc. Confidential.

Hyperscale Hyperconverged LINUX Hypervisor Windows LINUX Hypervisor Windows Hadoop, NoSQL cluster Storage cluster Hadoop/NoSQL + Storage cluster Storage client Storage node

How SDS provides elastic storage for big data Big data Big data Big data 1 Admin provisions virtual volumes and script or apply storage policies 2 Virtual volume presents block, file, & object storage to big data hosts iscsi Storage cluster NFS Object 3 4 Storage client captures guest I/O and communicates to underlying cluster Cluster distributes and replicates data, applies compression & dedupe 5 Cluster autotiers & balances to optimize data locality & availability = x86 or ARM server DC1 DC2 Cloud3 11 Copyright 2015 Hedvig Inc. Confidential.

6 SDS capabilities for big data

6 big data friendly SDS capabilities 1. I/O sequentialization 2. Tunable replication 3. DR replication 4. Disk failures and rebuilds 5. Data efficiency methods 6. Flash caching & flash pinning 13 Copyright 2015 Hedvig Inc. Confidential.

1. Random I/O to sequential writes Big data node 1 Application writes data in random blocks, and gets immediate ack from cluster. 2 Storage cluster sequentializes incoming blocks (in RAM+SSD) into larger chunks. 3 Storage cluster writes larger sequentialized data chunks to underlying disks in auto-balanced, and autodistributed manner according to policy. Storage client Storage node 14 Copyright 2015 Hedvig Inc.

Example: Single write operation Big data node Example Policy: 3 COPIES; AGNOSTIC 1 2 Application sends write to any storage cluster node. (round-robin) Cluster node writes first aggregated blocks locally. Second copy written to first responding cluster node. SSD/Flash SAS/SATA 3 4 Ack sent back to big data node after majority quorum of acks. (2 ack s in case of 3 copies) CHECKSUMMED! Third copy is written semi-synchronously. Could also be synchronous if all servers are equidistant. Hedvig Controller software Hedvig Cluster software 15

2. Granular replication of data Chunks are distributed across all servers and containers in the storage cluster. Big data node Granular data chunks Disk Platter Disk Platter Disk Platter Storage containers Hedvig Controller software Hedvig Cluster software

3. DR Policy: 3xDC-aware with 3 copies DR Policy: Datacenter-aware (One copy per DC) Data Copies: 3 Sync-Acknowledgements: 2 Active Active Data Center A Data Center B Data Center C Hedvig Controller software Hedvig Cluster software

4. Disk failures and rebuilds Disks managed in protection groups. Disk rebuilds initiated automatically upon disk failure across entire cluster. No spare disks needed. Quick wide-stripe rebuilds allow for largest disks. Average 4TB disk rebuild time is under 20 minutes. Easily support 6TB, 8TB and 10TB drives.

5. Thin provisioning, deduplication and compression Thin provisioning for every virtual volume Inline compression and deduplication Global, system-wide deduplication all attached storage nodes participate 60-75% data reduction dedupe rates vary based on data type Dedupe cache can reside on Controller SSD/flash in application server Eliminate all duplicate I/O from network, dramatically lower latency and increase IOPS! Clone non-deduped volume with dedupe Client-side SSD/flash dedupe read cache with dedupe map Big data node + storage client Cluster SSD/flash read+write cache Storage node

6. 3 ways Hedvig uses SSD and PCIe flash Big data node + storage client Client side read and dedupe cache on big data node Storage node Read/write cache on storage nodes Primary storage as dedicated volume on storage nodes + flash pinning

3 cases studies of SDS for big data

Three case studies Fortune 100 bank Fortune 50 telecom 4 th largest US law firm Deploying Cassandra and MongoDB for developers with infrastructure self-provisioning for DevOps model. Multiple NoSQL deployments leading to islands of (elastic) storage and inability to self-provision or plug into bank s orchestration tools. Building elastic SDS cluster on commodity infrastructure to lower cost per bit by and drive selfprovisioning through RESTful APIs. Seeks centralized, shared storage to virtualize 3 Hadoop distributions: Hortonworks, Cloudera, and MapR. Multiple Hadoop deployments leading to islands of (also elastic) storage and preventing IT s virtualization-first policy. Virtualizing all three Hadoop distributions and deploying SDS as the data lake for scale-out storage and global dedupe across data sets. Needs quick, reliable indexing of 100M active client docs in HP Autonomy. Needed a scale-out, flash-friendly solution to replace local SSDs, which are required to achieve sub-one second index queries. Getting 6x performance with SDS versus traditional hybrid array, which included flash tier; now has incremental commodity scalability. 22 Copyright 2015 Hedvig Inc. Confidential.

Thank you!