Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing

Similar documents
High Performance File System and I/O Middleware Design for Big Data on HPC Clusters

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS

High-Performance Training for Deep Learning and Computer Vision HPC

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters

Exploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures

NVMf based Integration of Non-volatile Memory in a Distributed System - Lessons learned

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

High Performance Design for HDFS with Byte-Addressability of NVM and RDMA

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Sunrise or Sunset: Exploring the Design Space of Big Data So7ware Stacks

Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached

Exploiting HPC Technologies for Accelerating Big Data Processing and Storage

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

Accelerating Big Data Processing with RDMA- Enhanced Apache Hadoop

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

High-Performance Broadcast for Streaming and Deep Learning

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth

Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects

HPC Meets Big Data: Accelerating Hadoop, Spark, and Memcached with HPC Technologies

Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication

Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services

Accelerate Big Data Processing (Hadoop, Spark, Memcached, & TensorFlow) with HPC Technologies

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase

Scott Oaks, Oracle Sunil Raghavan, Intel Daniel Verkamp, Intel 03-Oct :45 p.m. - 4:30 p.m. Moscone West - Room 3020

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters

Accelerating and Benchmarking Big Data Processing on Modern Clusters

Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage*

Accelerating Real-Time Big Data. Breaking the limitations of captive NVMe storage

RDMA for Memcached User Guide

Highly Scalable, Non-RDMA NVMe Fabric. Bob Hansen,, VP System Architecture

Accelerate Big Data Insights

PHX: Memory Speed HPC I/O with NVM. Pradeep Fernando Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan

Accelerating HPL on Heterogeneous GPU Clusters

Coupling GPUDirect RDMA and InfiniBand Hardware Multicast Technologies for Streaming Applications

Windows Support for PM. Tom Talpey, Microsoft

Accelerating and Benchmarking Big Data Processing on Modern Clusters

Windows Support for PM. Tom Talpey, Microsoft

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning

Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning

Accessing NVM Locally and over RDMA Challenges and Opportunities

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing

Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics Vijay Balakrishnan Memory Solutions Lab. Samsung Semiconductor, Inc.

Flash Storage Complementing a Data Lake for Real-Time Insight

A New Key-value Data Store For Heterogeneous Storage Architecture Intel APAC R&D Ltd.

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization

CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Big Data Analytics with the OSU HiBD Stack at SDSC. Mahidhar Tatineni OSU Booth Talk, SC18, Dallas

2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications

Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems

MQSim: A Framework for Enabling Realistic Studies of Modern Multi-Queue SSD Devices

Extending RDMA for Persistent Memory over Fabrics. Live Webcast October 25, 2018

Acceleration for Big Data, Hadoop and Memcached

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

Distributed File Systems II

DLoBD: An Emerging Paradigm of Deep Learning over Big Data Stacks on RDMA- enabled Clusters

Low-Overhead Flash Disaggregation via NVMe-over-Fabrics

Farewell to Servers: Hardware, Software, and Network Approaches towards Datacenter Resource Disaggregation

Pocket: Elastic Ephemeral Storage for Serverless Analytics

Application Access to Persistent Memory The State of the Nation(s)!

Remote Persistent Memory With Nothing But Net Tom Talpey Microsoft

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

Building the Most Efficient Machine Learning System

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.

Cloud Computing with FPGA-based NVMe SSDs

A New Key-Value Data Store For Heterogeneous Storage Architecture

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters

I/O Profiling Towards the Exascale

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Strata: A Cross Media File System. Youngjin Kwon, Henrique Fingler, Tyler Hunt, Simon Peter, Emmett Witchel, Thomas Anderson

Advanced RDMA-based Admission Control for Modern Data-Centers

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

A BigData Tour HDFS, Ceph and MapReduce

IME (Infinite Memory Engine) Extreme Application Acceleration & Highly Efficient I/O Provisioning

Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

High-Performance MPI Library with SR-IOV and SLURM for Virtualized InfiniBand Clusters

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

SoftFlash: Programmable Storage in Future Data Centers Jae Do Researcher, Microsoft Research

Toward a Memory-centric Architecture

Oracle Exadata: Strategy and Roadmap

A Disseminated Distributed OS for Hardware Resource Disaggregation Yizhou Shan

Benchmarking Persistent Memory in Computers

Transcription:

Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing Talk at Storage Developer Conference SNIA 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

Big Data Management and Processing on Modern Clusters Substantial impact on designing and utilizing data management and processing systems in multiple tiers Front-end data accessing and serving (Online) Memcached + DB (e.g. MySQL), HBase Back-end data analytics (Offline) HDFS, MapReduce, Spark Front-end Tier Back-end Tier Internet Data Accessing and Serving Web Server Web Server Web Server Memcached + DB Memcached (MySQL) + DB Memcached (MySQL) + DB (MySQL) NoSQL DB (HBase) NoSQL DB (HBase) NoSQL DB (HBase) Data Analytics Apps/Jobs MapReduce Spark HDFS SDC SNIA 18 2

Big Data Processing with Apache Big Data Analytics Stacks Major components included: MapReduce (Batch) Spark (Iterative and Interactive) HBase (Query) HDFS (Storage) RPC (Inter-process communication) Underlying Hadoop Distributed File System (HDFS) used by MapReduce, Spark, HBase, and many others Model scales but high amount of communication and I/O can be further optimized! User Applications MapReduce Spark HBase HDFS Hadoop Common (RPC) Apache Big Data Analytics Stacks SDC SNIA 18 3

Drivers of Modern HPC Cluster and Data Center Architecture Multi-/Many-core Processors Multi-core/many-core technologies Remote Direct Memory Access (RDMA)-enabled networking (InfiniBand and RoCE) Single Root I/O Virtualization (SR-IOV) NVM and NVMe-SSD High Performance Interconnects InfiniBand (with SR-IOV) <1usec latency, 200Gbps Bandwidth> Accelerators (NVIDIA GPGPUs and FPGAs) Accelerators / Coprocessors high compute density, high performance/watt >1 TFlop DP on a chip SSD, NVMe-SSD, NVRAM SDSC Comet TACC Stampede Cloud Cloud SDC SNIA 18 4

The High-Performance Big Data (HiBD) Project RDMA for Apache Spark RDMA for Apache Hadoop 2.x (RDMA-Hadoop-2.x) Plugins for Apache, Hortonworks (HDP) and Cloudera (CDH) Hadoop distributions RDMA for Apache HBase RDMA for Memcached (RDMA-Memcached) RDMA for Apache Hadoop 1.x (RDMA-Hadoop) OSU HiBD-Benchmarks (OHB) HDFS, Memcached, HBase, and Spark Micro-benchmarks http://hibd.cse.ohio-state.edu Users Base: 290 organizations from 34 countries More than 27,800 downloads from the project site Available for InfiniBand and RoCE Available for x86 and OpenPOWER Significant performance improvement with RDMA+DRAM compared to default Sockets-based designs; How about RDMA+NVRAM? SDC SNIA 18 5

Non-Volatile Memory (NVM) and NVMe-SSD 3D XPoint from Intel & Micron Samsung NVMe SSD Performance of PMC Flashtec NVRAM [*] Non-Volatile Memory (NVM) provides byte-addressability with persistence The huge explosion of data in diverse fields require fast analysis and storage NVMs provide the opportunity to build high-throughput storage systems for data-intensive applications Storage technology is moving rapidly towards NVM [*] http://www.enterprisetech.com/2014/08/06/ flashtec-nvram-15-million-iops-sub-microsecond- latency/ SDC SNIA 18 6

NVRAM Emulation based on DRAM Popular methods employed by recent works to emulate NVRAM performance model over DRAM Two ways: Emulate byte-addressable NVRAM over DRAM Emulate block-based NVM device over DRAM Application Virtual File System Block Device PCMDisk (RAM-Disk + Delay) DRAM mmap/memcpy/msync (DAX) open/read/write/close Load/Store Application pmem_memcpy_persist Persistent Memory Library Clflush + Delay Load/store DRAM SDC SNIA 18 7

Presentation Outline NRCIO: NVM-aware RDMA-based Communication and I/O Schemes NRCIO for Big Data Analytics NVMe-SSD based Big Data Analytics Conclusion and Q&A SDC SNIA 18 8

Design Scope (NVM for RDMA) D-to-D over RDMA: Communication buffers for client and server are allocated in DRAM (Common) DRAM NVM NVM DRAM NVM NVM HDFS-RDMA (RDMADFSClient) Client Client CPU NIC PCIe PCIe NIC HDFS-RDMA Server (RDMADFSServer) Server CPU HDFS-RDMA (RDMADFSClient) Client Client CPU NIC PCIe PCIe NIC HDFS-RDMA (RDMADFSServer) Server Server CPU HDFS-RDMA (RDMADFSClient) Client Client CPU NIC PCIe PCIe NIC HDFS-RDMA (RDMADFSServer) Server Server CPU D-to-N over RDMA N-to-D over RDMA N-to-N over RDMA D-to-N over RDMA: Communication buffers for client are allocated in DRAM; Server uses NVM N-to-D over RDMA: Communication buffers for client are allocated in NVM; Server uses DRAM N-to-N over RDMA: Communication buffers for client and server are allocated in NVM SDC SNIA 18 9

NVRAM-aware RDMA-based Communication in NRCIO NRCIO RDMA Write over NVRAM NRCIO RDMA Read over NVRAM SDC SNIA 18 10

DRAM-TO-NVRAM RDMA-Aware Communication with NRCIO Latency (us) 25 20 15 10 5 0 NRCIO-RW 256 4K 16K 256 4K 16K 256 4K 16K 1xDRAM 2xDRAM 5xDRAM Data Size (Bytes) NRCIO-RR Latency (ms) 3.5 3 2.5 2 1.5 1 0.5 0 NRCIO-RW NRCIO-RR 256K 1M 4M 256K 1M 4M 256K 1M 4M 1xDRAM 2xDRAM 5xDRAM Data Size (Bytes) Comparison of communication latency using NRCIO RDMA read and write communication protocols over InfiniBand EDR HCA with DRAM as source and NVRAM as destination {NxDRAM} NVRAM emulation mode = Nx NVRAM write slowdown vs. DRAM with clflushopt (emulated) + sfence Smaller impact of time-for-persistence on the end-to-end latencies for small messages vs. large messages => larger number of cache lines to flush SDC SNIA 18 11

Latency (us) NVRAM-TO-NVRAM RDMA-Aware Communication with NRCIO 25 20 15 10 5 0 NRCIO-RW NRCIO-RR 64 1K 16K 64 1K 16K 64 1K 16K No Persist (D2D) 1x,2x 2x,5x Data Size (Bytes) Comparison of communication latency using NRCIO RDMA read and write communication protocols over InfiniBand EDR HCA vs. DRAM {Ax, By} NVRAM emulation mode = Ax NVRAM read slowdown and Bx NVRAM write slowdown vs. NVRAM High end-to-end latencies due to slower writes to non-volatile persistent memory E.g., 3.9x for {1x,2x} and 8x for {2x,5x} SDC SNIA 18 12 Latency (ms) 3.5 3 2.5 2 1.5 1 0.5 0 NRCIO-RW NRCIO-RR 256K 1M 4M 256K 1M 4M 256K 1M 4M No Persist (D2D) 1x,2x 2x,5x Data Size (Bytes)

Presentation Outline NRCIO: NVM-aware RDMA-based Communication and I/O Schemes NRCIO for Big Data Analytics NVMe-SSD based Big Data Analytics Conclusion and Q&A SDC SNIA 18 13

Opportunities of Using NVRAM+RDMA in HDFS Files are divided into fixed sized blocks Blocks divided into packets NameNode: stores the file system namespace DataNode: stores data blocks in local storage devices Uses block replication for fault tolerance Replication enhances data-locality and read throughput Communication and I/O intensive Java Sockets based communication Data needs to be persistent, typically on SSD/HDD Client NameNode DataNodes SDC SNIA 18 14

Design Overview of NVM and RDMA-aware HDFS (NVFS) Applications and Benchmarks Hadoop MapReduce Spark HBase RDMA Receiver RDMA Sender Co-Design (Cost-Effectiveness, Use-case) NVM and RDMA-aware HDFS (NVFS) DataNode RDMA Writer/Reader DFSClient Replicator RDMA Receiver NVFS -BlkIO NVM NVFS- MemIO SSD SSD SSD Design Features RDMA over NVM HDFS I/O with NVM Block Access Memory Access Hybrid design NVM with SSD as a hybrid storage for HDFS I/O Co-Design with Spark and HBase Cost-effectiveness Use-case N. S. Islam, M. W. Rahman, X. Lu, and D. K. Panda, High Performance Design for HDFS with Byte-Addressability of NVM and RDMA, 24th International Conference on Supercomputing (ICS), June 2016 SDC SNIA 18 15

Evaluation with Hadoop MapReduce Average Throughput (MBps) 350 300 250 200 150 100 50 0 HDFS (56Gbps) NVFS-BlkIO (56Gbps) NVFS-MemIO (56Gbps) Write Read SDSC Comet (32 nodes) TestDFSIO on SDSC Comet (32 nodes) Write: NVFS-MemIO gains by 4x over HDFS Read: NVFS-MemIO gains by 1.2x over HDFS 4x 1.2x Average Throughput (MBps) TestDFSIO 1400 1200 1000 800 600 400 200 0 HDFS (56Gbps) NVFS-BlkIO (56Gbps) NVFS-MemIO (56Gbps) 4x Write Read SDC SNIA 18 16 2x OSU Nowlab (4 nodes) TestDFSIO on OSU Nowlab (4 nodes) Write: NVFS-MemIO gains by 4x over HDFS Read: NVFS-MemIO gains by 2x over HDFS

Throughput (ops/s) 800 700 600 500 400 300 200 100 0 Evaluation with HBase HDFS (56Gbps) 8:800K 16:1600K 32:3200K Cluster Size : No. of Records HBase 100% insert NVFS (56Gbps) 21% 1200 1000 YCSB 100% Insert on SDSC Comet (32 nodes) NVFS-BlkIO gains by 21% by storing only WALs to NVM YCSB 50% Read, 50% Update on SDSC Comet (32 nodes) NVFS-BlkIO gains by 20% by storing only WALs to NVM Throughput (ops/s) 800 600 400 200 0 8:800K 16:1600K 32:3200K Cluster Size : Number of Records HBase 50% read, 50% update 20% SDC SNIA 18 17

Opportunities to Use NVRAM+RDMA in MapReduce Bulk Data Transfer Disk Operations Map and Reduce Tasks carry out the total job execution Map tasks read from HDFS, operate on it, and write the intermediate data to local disk (persistent) Reduce tasks get these data by shuffle from NodeManagers, operate on it and write to HDFS (persistent) Communication and I/O intensive; Shuffle phase uses HTTP over Java Sockets; I/O operations take place in SSD/HDD typically SDC SNIA 18 18

Opportunities to Use NVRAM in MapReduce-RDMA Design Map Task Reduce Task Input Files Spill Read Map Merge Opportunities exist to improve the performance with Map NVRAM Task Spill Read Map Merge Intermediate Data RDMA Shuffle Shuffle In- Mem Merge Reduce Task In- Mem Merge Reduce All Operations are In- Memory Reduce Output Files SDC SNIA 18 19

NVRAM-Assisted Map Spilling in MapReduce-RDMA Map Task Reduce Task Input Files Read Read Map Map Task Map Spill Merge Spill Merge Intermediate Data Shuffle q Minimizes the disk operations in Spill phase NVRAM RDMA Shuffle In- Mem Merge Reduce Task In- Mem Merge Reduce Reduce Output Files M. W. Rahman, N. S. Islam, X. Lu, and D. K. Panda, Can Non-Volatile Memory Benefit MapReduce Applications on HPC Clusters? PDSW-DISCS, with SC 2016. M. W. Rahman, N. S. Islam, X. Lu, and D. K. Panda, NVMD: Non-Volatile Memory Assisted Design for Accelerating MapReduce and DAG Execution Frameworks on HPC Systems? IEEE BigData 2017. SDC SNIA 18 20

Comparison with Sort and TeraSort 51% 55% 2.48x 2.37x RMR-NVM achieves 2.37x benefit for Map phase compared to RMR and MR-IPoIB; overall benefit 55% compared to MR-IPoIB, 28% compared to RMR RMR-NVM achieves 2.48x benefit for Map phase compared to RMR and MR-IPoIB; overall benefit 51% compared to MR-IPoIB, 31% compared to RMR SDC SNIA 18 21

Evaluation of Intel HiBench Workloads We evaluate different HiBench workloads with Huge data sets on 8 nodes Performance benefits for Shuffle-intensive workloads compared to MR-IPoIB: Sort: 42% (25 GB) TeraSort: 39% (32 GB) PageRank: 21% (5 million pages) Other workloads: WordCount: 18% (25 GB) KMeans: 11% (100 million samples) SDC SNIA 18 22

Evaluation of PUMA Workloads We evaluate different PUMA workloads on 8 nodes with 30GB data size Performance benefits for Shuffle-intensive workloads compared to MR-IPoIB : AdjList: 39% SelfJoin: 58% RankedInvIndex: 39% Other workloads: SeqCount: 32% InvIndex: 18% SDC SNIA 18 23

Presentation Outline NRCIO: NVM-aware RDMA-based Communication and I/O Schemes NRCIO for Big Data Analytics NVMe-SSD based Big Data Analytics Conclusion and Q&A SDC SNIA 18 24

Overview of NVMe Standard NVMe is the standardized interface for PCIe SSDs Built on RDMA principles Submission and completion I/O queues Similar semantics as RDMA send/recv queues Asynchronous command processing Up to 64K I/O queues, with up to 64K commands per queue Efficient small random I/O operation MSI/MSI-X and interrupt aggregation NVMe Command Processing Source: NVMExpress.org SDC SNIA 18 25

Overview of NVMe-over-Fabric Remote access to flash with NVMe over the network RDMA fabric is of most importance Low latency makes remote access feasible 1 to 1 mapping of NVMe I/O queues to RDMA send/recv queues I/O Submission Queue SQ I/O Completion Queue RQ NVMe RDMA Fabric Low latency overhead compared to local I/O NVMf Architecture SDC SNIA 18 26

Design Challenges with NVMe-SSD QoS Hardware-assisted QoS Persistence Flushing buffered data Performance Consider flash related design aspects Read/Write performance skew Garbage collection Virtualization SR-IOV hardware support Namespace isolation New software systems Disaggregated Storage with NVMf Persistent Caches Co-design SDC SNIA 18 27

Evaluation with RocksDB Latency (us) Latency (us) 15 500 10 400 300 5 200 100 0 Insert Overwrite Random Read 0 Write Sync Read Write POSIX SPDK POSIX SPDK 20%, 33%, 61% improvement for Insert, Write Sync, and Read Write Overwrite: Compaction and flushing in background Low potential for improvement Read: Performance much worse; Additional tuning/optimization required SDC SNIA 18 28

Evaluation with RocksDB Throughput (ops/sec) Throughput (ops/sec) 600000 20000 500000 400000 15000 300000 10000 200000 100000 5000 0 0 Insert Overwrite Random Read Write Sync Read Write POSIX SPDK POSIX SPDK 25%, 50%, 160% improvement for Insert, Write Sync, and Read Write Overwrite: Compaction and flushing in background Low potential for improvement Read: Performance much worse; Additional tuning/optimization required SDC SNIA 18 29

QoS-aware SPDK Design Bandwidth (MB/s) Scenario 1 150 100 50 0 1 5 9 13 17 21 25 29 33 37 41 45 49 Time High Priority Job (WRR) Medium Priority Job (WRR) High Priority Job (OSU-Design) Medium Priority Job (OSU-Design) Job Bandwidth Ratio 5 4 3 2 1 0 Synthetic Application Scenarios 2 3 4 5 Scenario SPDK-WRR OSU-Design Desired Synthetic application scenarios with different QoSrequirements Comparison using SPDK with Weighted Round Robbin NVMe arbitration Near desired job bandwidth ratios Stable and consistent bandwidth S. Gugnani, X. Lu, and D. K. Panda, Analyzing, Modeling, and Provisioning QoSfor NVMe SSDs, (Under Review) SDC SNIA 18 30

Conclusion and Future Work Big Data Analytics needs high-performance NVM-aware RDMA-based Communication and I/O Schemes Proposed a new library, NRCIO (work-in-progress) Re-design HDFS storage architecture with NVRAM Re-design RDMA-MapReduce with NVRAM Design Big Data analytics stacks with NVMe and NVMf protocols Results are promising Further optimizations innrcio Co-design with more Big Data analytics frameworks TensorFlow, Object Storage, Database, etc. SDC SNIA 18 31

Thank You! luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi Network-Based Computing Laboratory http://nowlab.cse.ohio-state.edu/ The High-Performance Big Data Project http://hibd.cse.ohio-state.edu/ SDC SNIA 18 32