Cloud Computing with FPGA-based NVMe SSDs

Similar documents
Developing Extremely Low-Latency NVMe SSDs

Aerospike Scales with Google Cloud Platform

Oracle Exadata: Strategy and Roadmap

Intelligent Hybrid Flash Management

Recent Innovations in Data Storage Technologies Dr Roger MacNicol Software Architect

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

NVMe SSDs Future-proof Apache Cassandra

Re-Architecting Cloud Storage with Intel 3D XPoint Technology and Intel 3D NAND SSDs

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

IaaS Vendor Comparison

Oracle Database Exadata Cloud Service Exadata Performance, Cloud Simplicity DATABASE CLOUD SERVICE

Using MRAM to Create Intelligent SSDs

The Role of Database Aware Flash Technologies in Accelerating Mission- Critical Databases

InfiniBand Networked Flash Storage

Using FPGAs to accelerate NVMe-oF based Storage Networks

Was ist dran an einer spezialisierten Data Warehousing platform?

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Accelerating Data Centers Using NVMe and CUDA

Hardware NVMe implementation on cache and storage systems

Oracle Exadata X7. Uwe Kirchhoff Oracle ACS - Delivery Senior Principal Service Delivery Engineer

Maximizing Fraud Prevention Through Disruptive Architectures Delivering speed at scale.

Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators

Computational Storage: Acceleration Through Intelligence & Agility

Achieving Memory Level Performance: Secrets Beyond Shared Flash

Architecture of a Real-Time Operational DBMS

Flash Storage Complementing a Data Lake for Real-Time Insight

HPE SimpliVity. The new powerhouse in hyperconvergence. Boštjan Dolinar HPE. Maribor Lancom

DDN. DDN Updates. Data DirectNeworks Japan, Inc Shuichi Ihara. DDN Storage 2017 DDN Storage

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

Cloud Storage with AWS: EFS vs EBS vs S3 AHMAD KARAWASH

Storage Optimization with Oracle Database 11g

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

Accelerating Data Center Workloads with FPGAs

Improving Ceph Performance while Reducing Costs

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

RIDESHARING YOUR CLOUD DATA CENTER Realize Better Resource Utilization with NVMe-oF

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

SOLUTION BRIEF. QxStack vsan ReadyNode -Solution Brief for MSSQL

IBM FlashSystem. IBM FLiP Tool Wie viel schneller kann Ihr IBM i Power Server mit IBM FlashSystem 900 / V9000 Storage sein?

Service Oriented Performance Analysis

Concurrent execution of an analytical workload on a POWER8 server with K40 GPUs A Technology Demonstration

RIDESHARING YOUR CLOUD DATA CENTER Realize Better Resource Utilization with NVMe-oF

All-Flash High-Performance SAN/NAS Solutions for Virtualization & OLTP

Emulex LPe16000B 16Gb Fibre Channel HBA Evaluation

Building an All Flash Server What s the big deal? Isn t it all just plug and play?

Virtual Storage Tier and Beyond

Accelerating Ceph with Flash and High Speed Networks

Session 201-B: Accelerating Enterprise Applications with Flash Memory

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit

Oracle TimesTen Scaleout: Revolutionizing In-Memory Transaction Processing

PCIe Storage Beyond SSDs

CIT 668: System Architecture. Amazon Web Services

4 Myths about in-memory databases busted

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Oracle IaaS, a modern felhő infrastruktúra

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION

An Oracle White Paper June Enterprise Database Cloud Deployment with Oracle SuperCluster T5-8

Convergence and Flash: Reinventing Storage (Again)

On-Premises Cloud Platform. Bringing the public cloud, on-premises

How to Network Flash Storage Efficiently at Hyperscale. Flash Memory Summit 2017 Santa Clara, CA 1

Bringing Intelligence to Enterprise Storage Drives

Accelerating NVMe-oF* for VMs with the Storage Performance Development Kit

Building the Most Efficient Machine Learning System

Architecting For Availability, Performance & Networking With ScaleIO

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Flash In the Data Center

Fast and Easy Persistent Storage for Docker* Containers with Storidge and Intel

Building the Most Efficient Machine Learning System

New Approach to Unstructured Data

Benchmarking Persistent Memory in Computers

BlueDBM: An Appliance for Big Data Analytics*

WITH INTEL TECHNOLOGIES

Optimizing the Data Center with an End to End Solutions Approach

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing

Pivot3 Acuity with Microsoft SQL Server Reference Architecture

Virtuozzo Hyperconverged Platform Uses Intel Optane SSDs to Accelerate Performance for Containers and VMs

Designing elastic storage architectures leveraging distributed NVMe. Your network becomes your storage!

When, Where & Why to Use NoSQL?

Emerging Technologies for HPC Storage

HPE Deep Learning Cookbook: Recipes to Run Deep Learning Workloads. Natalia Vassilieva, Sergey Serebryakov

Colin Cunningham, Intel Kumaran Siva, Intel Sandeep Mahajan, Oracle 03-Oct :45 p.m. - 5:30 p.m. Moscone West - Room 3020

Deep Learning Performance and Cost Evaluation

Cloud Acceleration. Performance comparison of Cloud vendors. Tobias Deml DOAG2017

Extending the Lifetime of SSD Controller

Replacing the FTL with Cooperative Flash Management

From Rack Scale to Network Scale: NVMe Over Fabrics Enables Exabyte Applica>ons. Zivan Ori, CEO & Co-founder, E8 Storage

Building dense NVMe storage

Building a High IOPS Flash Array: A Software-Defined Approach

NVMFS: A New File System Designed Specifically to Take Advantage of Nonvolatile Memory

Developing Low Latency NVMe Systems for HyperscaleData Centers. Prepared by Engling Yeo Santa Clara, CA Date: 08/04/2017

Write On Aws. Aws Tools For Windows Powershell User Guide using the aws tools for windows powershell (p. 19) this section includes information about

An NVMe-based FPGA Storage Workload Accelerator

A Non-Relational Storage Analysis

HPE ProLiant Gen10. Franz Weberberger Presales Consultant Server

Next-Generation Cloud Platform

Transcription:

Cloud Computing with FPGA-based NVMe SSDs Bharadwaj Pudipeddi, CTO NVXL Santa Clara, CA 1

Choice of NVMe Controllers ASIC NVMe: Fully off-loaded, consistent performance, M.2 or U.2 form factor ASIC OpenChannel: Host-controlled, partially off-loaded, low cost, M.2 or U.2 form factor FPGA NVMe: Fully off-loaded, consistent performance, multi-function, U.2 or AIC form factor A unique feature of FPGA controllers is multi-functionality the ability to reconfigure for both storage and acceleration. Santa Clara, CA 2

Utilization Factor in Cloud Servers When applications cannot use all the IOPs nor the capacity, Is fixed function SSD the right choice in all cases? Net Utilization Santa Clara, CA 3

NVMe Performance Scalability A large capacity (say 24TB) server requires many SSDs anywhere from 3 to 24. Here is a server with 10 to 12 NVMe drives: QPI RAM Xeon E5 Xeon E5 RAM NVMe SSD SSD SSD SSD SSD SSD SSD SSD SSD SSD Santa Clara, CA

Millions of IOPs Bandwidth in GB/Sec Problem: System-level IOPs constrained by CPU 9000 8000 7000 6000 5000 4000 3000 2000 1000 0 Theoretical IOPs CPU IOPs Unused DeviceBW 1 2 3 4 5 6 7 8 9 10 Number of 800K IOPs NVMe SSDs (1 TB per SSD) 14 12 10 8 6 4 2 0 Santa Clara, CA 5

NVMe Application Spectrum In the simplified 2x2, ASICs fit all quadrants for continuous, dedicated usage. FPGA SSD-OR-Accelerator Multi-Function (Acceleration + Storage) FPGA SSD-with-Accelerator FPGAs fit variable workloads in cloud and data analytics where acceleration and storage are often both in need. Instance (Cloud) OpenChannel ASIC SSD ASIC SSD Permanent (DB, Analytics) Single-Function (Storage-only) Santa Clara, CA 6

FPGA-based Controller Reconfigurable multi-function device (U.2 form factor) RTL-optimized acceleration and performance SCM-class latencies (sub-6 usec) by using high bandwidth memory designed for accelerator (SuperRAM) Compute acceleration ranging from 700 GFLOPs to 8 TFLOPs On-demand provisioning by Software Abstraction Layer (NAL) On-demand Provisioning SSD ACCEL Santa Clara, CA 7

Scaling in the box: Grouping sets of FPGAs for Tasks Container(0) Container(1) Container(2) Container(n-1) Cloud applications Cassandra Caffe2 Tensor Flow AeroSpike Configuration & libraries NAL Abstraction Layer FPGA Groups Group(1) F F F Group(2) F F Group(g) F F F F... 24x Santa Clara, CA

Cloud SKU Compaction from AWS Four popular AWS performance SKUs: f1 (fpga), i3 (NVMe), x1 (inmemory), g2 (GPU) With NVXL provisioning, all SKUs can be consolidated into 1 SKU Better utilization, easier management f1 i3 x1 g2 Up to 100+ TFLOPs, 10M IOPs, 72 TB, SCM-class 1TB of NVMe memory (1W1Q < 4 usec latency) Santa Clara, CA 9

Different modes against AWS i3 and f1 SKUs Santa Clara, CA 10

Application Benchmarks: Aerospike Aerospike is an In-Memory Key-Value Database designed for DRAM and Flash Hybrid architecture keeps index in DRAM and records in SSD via Write Buffer Write buffer is flushed when full for large updates to SSD for even wear AeroSpike is ideally suited for acceleration with NVXL platform by using module SuperRAM (each with 24GB of memory) for Write Buffers and Record caching Certain Aerospike features can also be accelerated by FPGAs. Examples: Group Bys, and Joins. Benchmark: YCSB (Yahoo Cloud Server Benchmarks) Against: Full In-Memory, Intel SSD, and NVXL SuperRAM. Santa Clara, CA 11

YCSB Runtime (Lower is better) Workload Read/Write Mix A 50/50 B 95/5 C 100/0 D Read Latest E RMW NVXL SSD cuts latency by half over pure SSD due to a giant smart cache ( borrowed from accelerator mode) Santa Clara, CA 12

YCSB Throughput (Higher is better) Workload Read/Write Mix A 50/50 B 95/5 C 100/0 D Read Latest E RMW With just two modules, NVXL SSD has even better throughput than in-memory and more predictable performance. Santa Clara, CA 13

Searching Images with Elastic Search ElasticSearch (ES) is an Apache Lucene-based distributed string search engine using schema-free JSON documents (extremely popular in retail and services) Application: Connect a Deep Learning inference model such as Caffe/AlexNet to ES for automatic tagging of anonymous images Note: indexing is normally infrequent and bursty During indexing, AlexNet is run on provisioned DL FPGA group (1) while ES uses Group(2) After indexing, Group(1) is dismantled and recirculated for other applications AlexNet (Caffe) {Label = Dog Rat Terrier } ES Index Group(1) F F F Group(2) F F Santa Clara, CA 14

ElasticSearch Results Indexing speedup of 83x Fully optimized: 1000x speedup! Query processing: 40% faster latency TCO benefits: Low power, high utilization NVXL Similar approach can also be used for search-by-image and ES or in-memory DB for latency about sub-5 msec (5x faster than CPU). NVXL Santa Clara, CA 15

Conclusion We present a new class of cloud server using multi-function devices in NVMe U.2 form factor The device is designed for storage and acceleration NVMe Storage benefits from SCM-class latencies due to richer memory resources Software layer provides grouping and reconfigurability features enabling a wide range of cloud applications from NoSQL to Deep Learning TCO benefits from this server cut data center costs for performance instances by 50% through increased utilization and SKU consolidation Santa Clara, CA 16