BlueDBM: An Appliance for Big Data Analytics*

Similar documents
HADP Talk BlueDBM: An appliance for Big Data Analytics

GraFBoost: Using accelerated flash storage for external graph analytics

BlueDBM: An Appliance for Big Data Analytics

Big Data Analytics Using Hardware-Accelerated Flash Storage

NOHOST: A New Storage Architecture for Distributed Storage Systems. Chanwoo Chung

Lightweight KV-based Distributed Store for Datacenters

Memory Expansion Technology Using Software-Controlled SSD

Mellanox InfiniBand Solutions Accelerate Oracle s Data Center and Cloud Solutions

GPUs and Emerging Architectures

Scaling Internet TV Content Delivery ALEX GUTARIN DIRECTOR OF ENGINEERING, NETFLIX

Nowcasting. D B M G Data Base and Data Mining Group of Politecnico di Torino. Big Data: Hype or Hallelujah? Big data hype?

Hardware NVMe implementation on cache and storage systems

White Paper. How the Meltdown and Spectre bugs work and what you can do to prevent a performance plummet. Contents

Big Data Systems on Future Hardware. Bingsheng He NUS Computing

Fast packet processing in the cloud. Dániel Géhberger Ericsson Research

Based on Big Data: Hype or Hallelujah? by Elena Baralis

IBM Spectrum Scale IO performance

Application-Managed Flash

The Memory Hierarchy 10/25/16

Building the Most Efficient Machine Learning System

IBM Power AC922 Server

DDN. DDN Updates. DataDirect Neworks Japan, Inc Nobu Hashizume. DDN Storage 2018 DDN Storage 1

Inexpensive Coordination in Hardware

Isilon Performance. Name

ADVANCED IN-MEMORY COMPUTING USING SUPERMICRO MEMX SOLUTION

4 Myths about in-memory databases busted

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

RAMCloud: A Low-Latency Datacenter Storage System Ankita Kejriwal Stanford University

Realizing the Next Generation of Exabyte-scale Persistent Memory-Centric Architectures and Memory Fabrics

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators

CS6453. Data-Intensive Systems: Rachit Agarwal. Technology trends, Emerging challenges & opportuni=es

New Approach to Unstructured Data

The Future of High Performance Interconnects

Caches. Han Wang CS 3410, Spring 2012 Computer Science Cornell University. See P&H 5.1, 5.2 (except writes)

Solid Access Technologies, LLC

FlashGrid Software Enables Converged and Hyper-Converged Appliances for Oracle* RAC

Next Generation Architecture for NVM Express SSD

Real Parallel Computers

Graph Database and Analytics in a GPU- Accelerated Cloud Offering

TECHNOLOGIES CO., LTD.

The Optimal CPU and Interconnect for an HPC Cluster

Towards Energy-Proportional Datacenter Memory with Mobile DRAM

NVMe: The Protocol for Future SSDs

BlueGene/L. Computer Science, University of Warwick. Source: IBM

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

ECE 574 Cluster Computing Lecture 23

Optimizing the Data Center with an End to End Solutions Approach

Storage Systems. Storage Systems

An FPGA-based In-line Accelerator for Memcached

UNIVERSITY OF MASSACHUSETTS Dept. of Electrical & Computer Engineering. Computer Architecture ECE 568

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Accelerating Data Science. Gustavo Alonso Systems Group Department of Computer Science ETH Zurich, Switzerland

Sharing High-Performance Devices Across Multiple Virtual Machines

Building the Most Efficient Machine Learning System

Increasing Performance of Existing Oracle RAC up to 10X

Achieving Memory Level Performance: Secrets Beyond Shared Flash

Memory-Based Cloud Architectures

Database Acceleration Solution Using FPGAs and Integrated Flash Storage

Unblinding the OS to Optimize User-Perceived Flash SSD Latency

FlexNIC: Rethinking Network DMA

Gen-Z Memory-Driven Computing

Main Memory and the CPU Cache

Cisco HyperFlex HX220c Edge M5

Practical Strategies For High Performance SQL Server High Availability

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Flash Trends: Challenges and Future

COMP283-Lecture 3 Applied Database Management

Key Points. Rotational delay vs seek delay Disks are slow. Techniques for making disks faster. Flash and SSDs

IBM s Data Warehouse Appliance Offerings

Cloud Computing with FPGA-based NVMe SSDs

Facilitating IP Development for the OpenCAPI Memory Interface Kevin McIlvain, Memory Development Engineer IBM. Join the Conversation #OpenPOWERSummit

Maximizing heterogeneous system performance with ARM interconnect and CCIX

SAP HANA. Jake Klein/ SVP SAP HANA June, 2013

Improved Solutions for I/O Provisioning and Application Acceleration

Emulex LPe16000B Gen 5 Fibre Channel HBA Feature Comparison

The Economics of InfiniBand Virtual Device I/O

Warehouse-Scale Computing

IBM Power Advanced Compute (AC) AC922 Server

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching

IN11E: Architecture and Integration Testbed for Earth/Space Science Cyberinfrastructures

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching

Near Memory Key/Value Lookup Acceleration MemSys 2017

CS252 S05. CMSC 411 Computer Systems Architecture Lecture 18 Storage Systems 2. I/O performance measures. I/O performance measures

A 101 Guide to Heterogeneous, Accelerated, Data Centric Computing Architectures

Tracking Acceleration with FPGAs. Future Tracking, CMS Week 4/12/17 Sioni Summers

NAND Interleaving & Performance

Enabling Technology for the Cloud and AI One Size Fits All?

Flash In the Data Center

Chapter 2 Parallel Hardware

Caribou: Intelligent Distributed Storage

CUDA on ARM Update. Developing Accelerated Applications on ARM. Bas Aarts and Donald Becker

NVM Express Awakening a New Storage and Networking Titan Shaun Walsh G2M Research

FAWN. A Fast Array of Wimpy Nodes. David Andersen, Jason Franklin, Michael Kaminsky*, Amar Phanishayee, Lawrence Tan, Vijay Vasudevan

Emerging Technologies for HPC Storage

A Breakthrough in Non-Volatile Memory Technology FUJITSU LIMITED

Transcription:

BlueDBM: An Appliance for Big Data Analytics* Arvind *[ISCA, 2015] Sang-Woo Jun, Ming Liu, Sungjin Lee, Shuotao Xu, Arvind (MIT) and Jamey Hicks, John Ankcorn, Myron King(Quanta) BigData@CSAIL Annual Meeting November 6, 2015 1 Big data analytics Analysis of previously unimaginable amount of data can provide deep insight Google has predicted flu outbreaks a week earlier than the Center for Disease Control (CDC) Analyzing personal genome can determine predisposition to diseases Social network chatter analysis can identify political revolutions before newspapers Scientific datasets can be mined to extract accurate models Likely to be the biggest economic driver for the IT industry for the next decade 2 1

A currently popular solution: RAM Cloud Cluster of machines with large DRAM capacity and fast interconnect + Fastest as long as data fits in DRAM - Power hungry and expensive - Performance drops when data doesn t fit in DRAM What if enough DRAM isn t affordable? -based solutions may be a better alternative + Faster than Disk, cheaper than DRAM + Lower power consumption than both - Legacy storage access interface is burdening - Slower than DRAM 3 Latency profile of distributed flash-based analytics Distributed processing involves many system components device access Storage software (OS, FTL, ) interface (10gE, Infiniband, ) Actual processing Access 75 μs Storage Software 100 μs 20 μs Processing 50~100 μs 100~1000 μs 20~1000 μs Latency is additive 4 2

Latency profile of distributed flash-based analytics Architectural modifications can remove unnecessary overhead Near-storage processing Cross-layer optimization of flash management software * Dedicated storage area network Accelerator Access 75 μs 50~100 μs < 20μs Difficult to explore using flash packaged as off-the-shelf SSDs 5 Custom flash card had to be built To VC707 HPC FMC PORT Artix 7 FPGA Bus 0 Bus 1 Bus 2 Bus 3 Ports Array (on both side) 6 3

BlueDBM: Platform with near-storage processing and inter-controller networks 20 24-core Xeon Servers 20 BlueDBM Storage devices 1TB flash storage x4 20Gbps controller network Xilinx VC707 2GB/s PCIe 7 BlueDBM: Platform with near-storage processing and inter-controller networks 1 of 2 Racks (10 Nodes) BlueDBM Storage Device 20 24-core Xeon Servers 20 BlueDBM Storage devices 1TB flash storage x4 20Gbps controller network Xilinx VC707 2GB/s PCIe 8 4

BlueDBM node architecture Device Controller In-Storage Processor PCIe Interface Lightweight flash management with very low overhead Custom Adds almost network no latency protocol with low ECC latency/high support bandwidth x4 20Gbps links at 0.5us latency Software has very low level Virtual channels with flow control access to flash storage High level information can be used for low level management FTL implemented inside file system Host Server 9 Power consumption is low Component Power (Watts) VC707 30 Board (x2) 10 Storage Device Total 40 Storage device power consumption is a very conservative estimate Component Power (Watts) Storage Device 40 Xeon Server 200+ Node Total 240+ GPU-based accelerator will double the power 10 5

Applications High-dimensional nearest neighbor search * Faster flash with accelerators as replacement for DRAM-based systems BlueCache An accelerated memcached * Dedicated network and accelerated caching systems with larger capacity Graph analytics Benefits of lower latency access into distributed flash for computation on large graphs * Results obtained since the paper submission 11 Image search accelerator Sang woo Jun, Chanwoo Chung BlueDBM + FPGA CPU Bottleneck BlueDBM + CPU Off-the shelf M.2. SSD Faster flash with acceleration can perform at DRAM speed 12 6

Bluecache: Accelerated memcached service Shuotao Xu Throughput (KOps per seconds) 350 300 250 200 150 100 50 0 Key size = 64 Bytes, Value size = 8K Bytes 5ms penalty per cache miss * Assuming no cache misses for Bluecache Bluecache Memcached+ Local DRAM 0 5 10 15 20 25 30 35 40 45 50 Cache misses (%) High cache-hit rate outweighs slow flashaccesses (small DRAM vs. large ) 13 Graph traversal performance Nodes traversed per second 18000 16000 14000 12000 10000 8000 6000 4000 2000 DRAM All DRAM accesses are remote, but use BlueDBM network as opposed to Ethernet 0 Software+DRAM Software + Separate Software + Controller Accelerator + Controller based system can achieve comparable performance with a much smaller cluster 14 7

Conclusion Fast flash-based distributed storage systems with low-latency random access may be a good platform to support complex queries on Big Data Reducing access latency for distributed storage requires architectural modifications, including in-storage processors and fast storage networks -based analytics hold a lot of promise, and we plan to continue demonstrating more application acceleration Thank you 15 8