Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Similar documents
Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

VMware Virtual SAN Technology

Architected for Performance. NVMe over Fabrics. September 20 th, Brandon Hoff, Broadcom.

DataON and Intel Select Hyper-Converged Infrastructure (HCI) Maximizes IOPS Performance for Windows Server Software-Defined Storage

SOFTWARE-DEFINED BLOCK STORAGE FOR HYPERSCALE APPLICATIONS

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

N V M e o v e r F a b r i c s -

Introduction to Database Services

Top 5 Reasons to Consider

Designing Next Generation FS for NVMe and NVMe-oF

SAP HANA IBM x3850 X6

FMS18 Invited Session 101-B1 Hardware Acceleration Techniques for NVMe-over-Fabric

InfiniBand Networked Flash Storage

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Storage Protocol Offload for Virtualized Environments Session 301-F

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Modern hyperconverged infrastructure. Karel Rudišar Systems Engineer, Vmware Inc.

Optimizing the Data Center with an End to End Solutions Approach

The Magic and Mystery of In-Memory Apps. Taufik Ma Industry Insight Shaun Walsh - Marketeer

Oracle Exadata: Strategy and Roadmap

Annual Update on Flash Memory for Non-Technologists

NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Life In The Flash Director - EMC Flash Strategy (Cross BU)

The Oracle Database Appliance I/O and Performance Architecture

Flash Storage Complementing a Data Lake for Real-Time Insight

2014 VMware Inc. All rights reserved.

NVM Express over Fabrics Storage Solutions for Real-time Analytics

Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Next-Generation NVMe-Native Parallel Filesystem for Accelerating HPC Workloads

Tiered Storage and Caching Decisions

Hitachi Virtual Storage Platform Family

Isilon Scale Out NAS. Morten Petersen, Senior Systems Engineer, Isilon Division

Cloud Meets Big Data For VMware Environments

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

I/O Virtualization: Enabling New Architectures in the Data Center for Server and I/O Connectivity Sujoy Sen Micron

NVM PCIe Networked Flash Storage

Accelerating Ceph with Flash and High Speed Networks

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

MICHAËL BORGERS, SYSTEM ENGINEER MATTHIAS SPELIER, SYSTEM ENGINEER. Software Defined Datacenter

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

The storage challenges of virtualized environments

Optimizing Quality of Service with SAP HANA on Power Rapid Cold Start

Enterprise Architectures The Pace Accelerates Camberley Bates Managing Partner & Analyst

DRBD SDS. Open Source Software defined Storage for Block IO - Appliances and Cloud Philipp Reisner. Flash Memory Summit 2016 Santa Clara, CA 1

Company. Intellectual Property. Headquartered in the Silicon Valley

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

RDMA Requirements for High Availability in the NVM Programming Model

Birds of a Feather Presentation

Copyright 2012 EMC Corporation. All rights reserved.

Opportunities from our Compute, Network, and Storage Inflection Points

Storage. CS 3410 Computer System Organization & Programming

STORAGE CONSOLIDATION WITH IP STORAGE. David Dale, NetApp

THE SUMMARY. CLUSTER SERIES - pg. 3. ULTRA SERIES - pg. 5. EXTREME SERIES - pg. 9

Increasing Performance of Existing Oracle RAC up to 10X

INFINIDAT Storage Architecture. White Paper

From Rack Scale to Network Scale: NVMe Over Fabrics Enables Exabyte Applica>ons. Zivan Ori, CEO & Co-founder, E8 Storage

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Learn Your Alphabet - SRIOV, NPIV, RoCE, iwarp to Pump Up Virtual Infrastructure Performance

Take control of storage performance

NVMe over Universal RDMA Fabrics

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

All Roads Lead to Convergence

Universal Storage. Innovation to Break Decades of Tradeoffs VASTDATA.COM

Reconstruyendo una Nube Privada con la Innovadora Hiper-Convergencia Infraestructura Huawei FusionCube Hiper-Convergente

Enhancing NVMe-oF Capabilities Using Storage Abstraction

Benefits of 25, 40, and 50GbE Networks for Ceph and Hyper- Converged Infrastructure John F. Kim Mellanox Technologies

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

MANAGING MULTI-TIERED NON-VOLATILE MEMORY SYSTEMS FOR COST AND PERFORMANCE 8/9/16

SAS Technical Update Connectivity Roadmap and MultiLink SAS Initiative Jay Neer Molex Corporation Marty Czekalski Seagate Technology LLC

NVMe Over Fabrics (NVMe-oF)

Copyright 2012 EMC Corporation. All rights reserved.

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.

Introducing Tegile. Company Overview. Product Overview. Solutions & Use Cases. Partnering with Tegile

FLASH.NEXT. Zero to One Million IOPS in a Flash. Ahmed Iraqi Account Systems Engineer North Africa

HP VMA-series Memory Arrays

New Approach to Unstructured Data

Copyright 2012 EMC Corporation. All rights reserved.

Evolution of Rack Scale Architecture Storage

Windows Support for PM. Tom Talpey, Microsoft

Deep Learning Performance and Cost Evaluation

Toward a Memory-centric Architecture

Cold Storage: The Road to Enterprise Ilya Kuznetsov YADRO

Building Your Own Robust and Powerful Software Defined Storage with VMware vsan. Tips on Choosing Hardware for vsan Deployment

DDN About Us Solving Large Enterprise and Web Scale Challenges

Reference Design: NVMe-oF JBOF

Architecture of a Real-Time Operational DBMS

Building dense NVMe storage

Deep Learning Performance and Cost Evaluation

Leveraging Software-Defined Storage to Meet Today and Tomorrow s Infrastructure Demands

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

An Exploration into Object Storage for Exascale Supercomputers. Raghu Chandrasekar

Unified Management for Virtual Storage

Adapted from: TRENDS AND ATTRIBUTES OF HORIZONTAL AND VERTICAL COMPUTING ARCHITECTURES

PSAM, NEC PCIe SSD Appliance for Microsoft SQL Server (Reference Architecture) September 4 th, 2014 NEC Corporation

The Fastest Scale-Out NAS

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Transcription:

A Cost Effective,, High g Performance,, Highly Scalable, Non-RDMA NVMe Fabric Bob Hansen,, VP System Architecture bob@apeirondata.com Storage Developers Conference, September 2015

Agenda 3 rd Platform Opportunity for High Performance Storage Applications with enhanced user experience require: High IOP performance & low-latency Storage performance = $$ PROFITS Scalability Scale out, in-memory compute/storage architecture evolution In-memory => in-box flash => external flash The ideal solution Use cases Apeiron s Shared DAS Architecture Software with HW acceleration Apeiron Data Fabric System architecture 2

3 rd Platform Opportunity 3

Enhanced User Experience Applications driving high IOP/low latency storage performance > Customer personalization and simplified data management NoSQL Cluster Web / App Layer Web Personalization Customer personalization > Fortune 500 companies mid-layer meta cache rapidly growing > Kayak Caching aged airline quotes to speed service > Netflix Personalization for >50M customers Structured data store > Amadeus 3.7 Million Bookings per Day 4

Ad Tech Example Deliver ad Log transaction Web page request Log results Lose Match request to user Fast Lookup (<1ms) Read consumer profile Update profile Match user to ad Deliver Ad within ms Win Bid for ad Lookup Available Ads Rank adds for best match Determine best ROI >1 billion consumers >3 billion devices Storage IOPs / latency = $$ 5

NoSQL solution Scale out nodes with dataset in-memory Application Servers switch... IC N IC NI C NI C NI Scale-out in-memory goodness Shared nothing compute nodes scale well Database is sharded evenly across all nodes Data set in-memory is VERY FAST To scale just add another node, shard the DB again and go Issues can be VERY expensive Node failure = very long recovery time Data at risk during recovery As data set grows more servers must be added = higher cost and foot print CPU to mem ratio can not be optimized This breaks down as you approach 100TB 6

Expensive? Add Internal Flash Application Servers switch... Flash Flash Flash Flash Scale-out in-memory goodness Share nothing compute nodes scale well Database is sharded evenly across all nodes Data set in-memory is VERY FAST Data in flash is FAST To scale just add another node, shard the DB again and go Storage Management is a Pain! Issues Flash size must be equal on all nodes Adding storage = downtime Node failure = very long recovery time Data at risk during recovery As data set grows more nodes must be added = higher cost and foot print CPU to mem ratio can not be optimized 7

Very High Performance External Storage is the answer Application Servers switch... HBA HBA HBA HBA Storage Network Shared DAS Goodness CPU and Storage scale independently Minimize cost / rack space Improved CPU utilization Fine Grain, On-line provisioning Server failures don t take out data Minimize failure recovery time Issues Performance IOPs and Predictable Latency Availability HA design and Replicas Scale PBs and 100s of nodes 8

The Ideal Solution - Shared Direct Attached Storage Best performing persistent storage media Standard NVMe SSDs also best cost Bare metal Ethernet storage network HW Low cost, industry standard networking Add value where you get the best ROI HW Accelerated, Networked Data path NVMe SSD Virtualization High availability with no performance penalty Best in class management On-line provisioning and failure recovery Storage performance statistics / predictive modeling Keep it simple! Deliver raw NVMe performance to the application 9

Application Use Cases Scale-out High Bandwidth Fast cache Pooled Flash For Data storage for applications Tiered storage Operational Big Data with high bandwidth acceleration node01 node02 node03 node04 SAN Web Layer Customer Personalization DB partition 1 SAP HANA DB Worker node Index server Statistic server SAP HANA Studio DB partition 2 SAP HANA DB Worker node Index server Statistic server SAP HANA DB DB partition 3 SAP HANA DB Worker node Index server Statistic server Shared file system - GPFS SAP HANA DB Standby node Index server Statistic server Scale-Up Compute Complex Scale-Up Compute Complex NoSQL Cluster HD D data01 Fla sh log01 HD D data02 Fla sh log02 HD D data03 Fla sh log03 HD D Fla sh Apeiron Fabric to SAS Bridge Apeiron Fabric to SATA Bridge SAS SAS SAS SATA SATA SATA Structured data store Ad Tech RTB Fraud detection Customer personalization In-memory check points requiring massive bandwidth SAS SAS SAS SATA SATA SATA Accelerates response to time critical data Seamless scaling 10

Apeiron s Solution - Shared DAS > Scale-out NVM storage architecture > Intelligent software with hardware accelerated data path > Ethernet storage fabric with <3uS round-trip latency overhead > Seamless scaling to petabytes Apeiron Shared DAS Cluster Memory Memory Application Servers... Memory Memory Storage Processing On Application Servers Apeiron Storage Virtualization Technology Flash commands encapsulated in Ethernet Slice Switched Storage Fabric Slice Slice... Slice Switched Storage Fabric Slice Flash array(s)... = SSDs + switch chip + FPGAs Only......... Slice Scale u p ADS1000 ADS1000 Scale out 11

Apeiron s Software with Hardware acceleration Instant Failover Automatic Replication Transparent Server to Server Remap Mirror Meta Data Reduces node rebuild time from >10hrs to <1sec Transparent backup Hardware assisted SW configured Reduces network congestion Accelerates DB manageability Remaps metadata to spare Increases OP/s 40% Increases application throughput Pooled external storage at near Performance Faster response generates more profits Dataset 12

Why not PCIe on a rope? A PCIe storage network is possible but faces several challenges - PCIe is not a network PCIe is an evolution and extension to a parallel system bus Initially scoped to support a handful of devices PCIe was not designed to be resilient Bus errors = panic Failure isolation is a work in progress There are currently no PCIe networking standards Why re-invent PCIe as a high cost, very complex external storage fabric? 13

RDMA / Apeiron Data Fabric Comparison NVMe Host Software NVMe Host Transport Abstraction NVMe RDMA RDMA Verbs iwarp Infiniband RoCE Fabrics iwarp Infiniband RoCE ADF Et thernet ADF <3 S Apeiron Data Fabric Simple, Robust Optimized for Ethernet and NVMe light weight Layer 2 RDMA Verbs NVMe RDMA Controller Transport Abstraction NVMe Controller Software ADF L2 Ethernet t Headers + 4B NoE Header/encapsulated dpcie TLPs 42B ~ > 112B Overhead The standard is not tied to any particular physical layer RDMA approach adds between 26B and 96B of headers, in addition to NVMe Encapsulation Flexible but adds complexity, link consumption and latency! 15

Apeiron System Architecture Shared DAS Application Servers switch Deliver raw NVMe performance... to the application HW HBA Accelerated Storage HBA Processing Offload On HBA Application Servers HBA 40Gb Bare Metal Ethernet Storage Network Apeiron Data Fabric Apeiron Data Fabric...... Very High Performance Network to SSD interface... Simple, scalable architecture with better than in-box flash performance Highly available, shared storage using standard SSDs and networking components Virtualized storage, on-line provisioning, failure isolation 15

Apeiron Technology Delivers > NVMe Virtualization > Performance Density 18M IOPs, 72GB/s BW In a 2U form factor > <90 S 4K read latency P99 (NAND flash) Ready for 3D Xpoint (<3 S Fabric Latency) Scale up Scale out Shared DAS Apeiron Virtualization Technology Apeiron Data Fabric Industry Standard NVM Technology 8/12/2015 16

All the simplicity and promise of DAS with the efficiency and capability of network t k attached tt h d storage. t