Sunrise or Sunset: Exploring the Design Space of Big Data So7ware Stacks

Size: px
Start display at page:

Download "Sunrise or Sunset: Exploring the Design Space of Big Data So7ware Stacks"

Transcription

1 Sunrise or Sunset: Exploring the Design Space of Big Data So7ware Stacks Panel PresentaAon at HPBDC 17 by Dhabaleswar K. (DK) Panda The Ohio State University E- mail: state.edu h<p:// state.edu/~panda

2 HPBDC 17 Panel 2 Q1: Are Big Data So7ware Stacks Mature or Not? Big Data soeware stacks like Hadoop, Spark and Memcached have been there for mulkple years Hadoop 11 years (Apache Hadoop.1. released on April, 26) Spark 5 years (Apache Spark.5.1 released on June, 212) Memcached 14 years (IniKal release of Memcached on May 22, 23) Increasingly being used in produckon environments OpKmized for commodity clusters with Ethernet and TCP/IP interface Not yet able to take full advantage of modern cluster and/or HPC technologies

3 HPBDC 17 Panel 3 Data Management and Processing on Modern Clusters SubstanKal impact on designing and uklizing data management and processing systems in mulkple Kers Front- end data accessing and serving (Online) Memcached + DB (e.g. MySQL), HBase Back- end data analykcs (Offline) HDFS, MapReduce, Spark Front-end Tier Back-end Tier Internet Data Accessing and Serving Web Server Web Server Web Server Memcached + DB Memcached (MySQL) + DB Memcached (MySQL) + DB (MySQL) NoSQL DB (HBase) NoSQL DB (HBase) NoSQL DB (HBase) Data Analytics Apps/Jobs MapReduce Spark HDFS

4 HPBDC 17 Panel 4 Who Are Using Hadoop? Focuses on large data and data analysis Hadoop (e.g. HDFS, MapReduce, RPC, HBase) environment is gaining a lot of momentum h<p://wiki.apache.org/hadoop/poweredby

5 HPBDC 17 Panel 5 Spark Ecosystem Generalize MapReduce to support new apps in same engine Two Key ObservaKons General task support with DAG MulK- stage and interackve apps require faster data sharing across parallel jobs BlinkDB Spark SQL Spark Streaming (real-time) GraphX (graph) MLlib (Machine Learning) Caffe, TensorFlow, BigDL, etc. (Deep Learning) Spark Standalone Apache Mesos YARN

6 Who Are Using Spark? Focuses on large data and data analysis with in- memory techniques Apache Spark is gaining a lot of momentum h<p://spark.apache.org/powered- by.html Network Based CompuAng Laboratory HPBDC 17 Panel 6

7 HPBDC 17 Panel 7 Q2: What are the Main Driving forces for New- generaaon Big Data So7ware Stacks?

8 HPBDC 17 Panel 8 Increasing Usage of HPC, Big Data and Deep Learning HPC (MPI, RDMA, Lustre, etc.) Big Data (Hadoop, Spark, HBase, Memcached, etc.) Deep Learning (Caffe, TensorFlow, BigDL, etc.) Convergence of HPC, Big Data, and Deep Learning!!!

9 How Can HPC Clusters with High- Performance Interconnect and Storage Architectures Benefit Big Data and Deep Learning ApplicaAons? Network Based CompuAng Laboratory HPBDC 17 Panel 9 Can the bo<lenecks be alleviated with new designs by taking advantage of HPC technologies? What are the major bo<lenecks in current Big Data processing and Deep Learning middleware (e.g. Hadoop, Spark)? Can RDMA- enabled high- performance interconnects benefit Big Data processing and Deep Learning? Can HPC Clusters with high- performance storage systems (e.g. SSD, parallel file systems) benefit Big Data and Deep Learning applicakons? How much performance benefits can be achieved through enhanced designs? How to design benchmarks for evaluakng the performance of Big Data and Deep Learning middleware on HPC clusters? Bring HPC, Big Data processing, and Deep Learning into a convergent trajectory!

10 HPBDC 17 Panel 1 Can We Run Big Data and Deep Learning Jobs on ExisAng HPC Infrastructure?

11 HPBDC 17 Panel 11 Can We Run Big Data and Deep Learning Jobs on ExisAng HPC Infrastructure?

12 HPBDC 17 Panel 12 Can We Run Big Data and Deep Learning Jobs on ExisAng HPC Infrastructure?

13 HPBDC 17 Panel 13 Can We Run Big Data and Deep Learning Jobs on ExisAng HPC Infrastructure?

14 HPBDC 17 Panel 14 Q3: What Chances are Provided for the Academia CommuniAes in Exploring the Design Spaces of Big Data So7ware Stacks?

15 HPBDC 17 Panel 15 Designing CommunicaAon and I/O Libraries for Big Data Systems: Challenges ApplicaAons Big Data Middleware (HDFS, MapReduce, HBase, Spark, grpc/tensorflow, and Memcached) Programming Models (Sockets) Benchmarks RDMA Protocols Point- to- Point CommunicaAon CommunicaAon and I/O Library Threaded Models and SynchronizaAon VirtualizaAon (SR- IOV) I/O and File Systems Networking Technologies (InfiniBand, 1/1/4/1 GigE and Intelligent NICs) QoS & Fault Tolerance Commodity CompuAng System Architectures (MulA- and Many- core architectures and accelerators) Performance Tuning Storage Technologies (HDD, SSD, NVM, and NVMe- SSD)

16 HPBDC 17 Panel 16 The High- Performance Big Data (HiBD) Project RDMA for Apache Spark RDMA for Apache Hadoop 2.x (RDMA- Hadoop- 2.x) Plugins for Apache, Hortonworks (HDP) and Cloudera (CDH) Hadoop distribukons RDMA for Apache HBase RDMA for Memcached (RDMA- Memcached) RDMA for Apache Hadoop 1.x (RDMA- Hadoop) OSU HiBD- Benchmarks (OHB) HDFS, Memcached, HBase, and Spark Micro- benchmarks hip://hibd.cse.ohio- state.edu Users Base: 23 organizakons from 3 countries More than 21,8 downloads from the project site Available for InfiniBand and RoCE Also run on Ethernet

17 RDMA for Apache Hadoop 2.x DistribuAon High- Performance Design of Hadoop over RDMA- enabled Interconnects High performance RDMA- enhanced design with nakve InfiniBand and RoCE support at the verbs- level for HDFS, MapReduce, and RPC components Enhanced HDFS with in- memory and heterogeneous storage High performance design of MapReduce over Lustre Memcached- based burst buffer for MapReduce over Lustre- integrated HDFS (HHH- L- BB mode) Plugin- based architecture supporkng RDMA- based designs for Apache Hadoop, CDH and HDP Easily configurable for different running modes (HHH, HHH- M, HHH- L, HHH- L- BB, and MapReduce over Lustre) and different protocols (nakve InfiniBand, RoCE, and IPoIB) Current release: 1.1. Based on Apache Hadoop Compliant with Apache Hadoop 2.7.1, HDP and CDH APIs and applicakons Tested with Mellanox InfiniBand adapters (DDR, QDR, FDR, and EDR) RoCE support with Mellanox adapters Various mulk- core plarorms hip://hibd.cse.ohio- state.edu Different file systems with disks and SSDs and Lustre Network Based CompuAng Laboratory HPBDC 17 Panel 17

18 Different Modes of RDMA for Apache Hadoop 2.x HHH: Heterogeneous storage devices with hybrid replicakon schemes are supported in this mode of operakon to have be<er fault- tolerance as well as performance. This mode is enabled by default in the package. HHH- M: A high- performance in- memory based setup has been introduced in this package that can be uklized to perform all I/O operakons in- memory and obtain as much performance benefit as possible. HHH- L: With parallel file systems integrated, HHH- L mode can take advantage of the Lustre available in the cluster. HHH- L- BB: This mode deploys a Memcached- based burst buffer system to reduce the bandwidth bo<leneck of shared file system access. The burst buffer design is hosted by Memcached servers, each of which has a local SSD. MapReduce over Lustre, with/without local disks: Besides, HDFS based solukons, this package also provides support to run MapReduce jobs on top of Lustre alone. Here, two different modes are introduced: with local disks and without local disks. Running with Slurm and PBS: Supports deploying RDMA for Apache Hadoop 2.x with Slurm and PBS in different running modes (HHH, HHH- M, HHH- L, and MapReduce over Lustre). Network Based CompuAng Laboratory HPBDC 17 Panel 18

19 HPBDC 17 Panel 19 RDMA for Apache Spark DistribuAon High- Performance Design of Spark over RDMA- enabled Interconnects High performance RDMA- enhanced design with nakve InfiniBand and RoCE support at the verbs- level for Spark RDMA- based data shuffle and SEDA- based shuffle architecture Support pre- conneckon, on- demand conneckon, and conneckon sharing Non- blocking and chunk- based data transfer Off- JVM- heap buffer management Easily configurable for different protocols (nakve InfiniBand, RoCE, and IPoIB) Current release:.9.4 Based on Apache Spark 2.1. Tested with Mellanox InfiniBand adapters (DDR, QDR, FDR, and EDR) RoCE support with Mellanox adapters Various mulk- core plarorms RAM disks, SSDs, and HDD hip://hibd.cse.ohio- state.edu

20 HiBD Packages on SDSC Comet and Chameleon Cloud RDMA for Apache Hadoop 2.x and RDMA for Apache Spark are installed and available on SDSC Comet. Examples for various modes of usage are available in: RDMA for Apache Hadoop 2.x: /share/apps/examples/hadoop RDMA for Apache Spark: /share/apps/examples/spark/ Please (reference Comet as the machine, and SDSC as the site) if you have any further queskons about usage and configurakon. RDMA for Apache Hadoop is also available on Chameleon Cloud as an appliance h<ps:// M. TaAneni, X. Lu, D. J. Choi, A. Majumdar, and D. K. Panda, Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet, XSEDE 16, July 216 Network Based CompuAng Laboratory HPBDC 17 Panel 2

21 Performance Numbers of RDMA for Apache Hadoop 2.x RandomWriter & TeraGen in OSU- RI2 (EDR) ExecuAon Time (s) Network Based CompuAng Laboratory 8 IPoIB (EDR) Reduced by 3x IPoIB (EDR) Reduced by 4x 7 OSU- IB (EDR) 6 OSU- IB (EDR) Data Size (GB) Data Size (GB) RandomWriter TeraGen Cluster with 8 Nodes with a total of 64 maps RandomWriter 3x improvement over IPoIB for 8-16 GB file size ExecuAon Time (s) TeraGen 4x improvement over IPoIB for 8-24 GB file size HPBDC 17 Panel 21

22 ExecuAon Time (s) Performance Numbers of RDMA for Apache Hadoop 2.x Sort & TeraSort in OSU- RI2 (EDR) Sort IPoIB (EDR) OSU- IB (EDR) Network Based CompuAng Laboratory Data Size (GB) Sort Cluster with 8 Nodes with a total of 64 maps and 14 reduces 61% improvement over IPoIB for 8-16 GB data Reduced by 61% ExecuAon Time (s) TeraSort IPoIB (EDR) OSU- IB (EDR) Data Size (GB) TeraSort Cluster with 8 Nodes with a total of 64 maps and 32 reduces 18% improvement over IPoIB for 8-24 GB data Reduced by 18% HPBDC 17 Panel 22

23 Design Overview of Spark with RDMA Apache Spark Benchmarks/ApplicaAons/Libraries/Frameworks Neiy Server Spark Core Shuffle Manager (Sort, Hash, Tungsten- Sort) Block Transfer Service (Neiy, NIO, RDMA- Plugin) NIO Server Java Socket Interface RDMA Server Neiy Client NIO Client RDMA Client Java NaAve Interface (JNI) NaAve RDMA- based Comm. Engine Design Features RDMA based shuffle plugin SEDA- based architecture Dynamic conneckon management and sharing Non- blocking data transfer Off- JVM- heap buffer management InfiniBand/RoCE support 1/1/4/1 GigE, IPoIB Network RDMA Capable Networks (IB, iwarp, RoCE..) Enables high performance RDMA communicakon, while supporkng tradikonal socket interface JNI Layer bridges Scala based Spark with communicakon library wri<en in nakve code X. Lu, M. W. Rahman, N. Islam, D. Shankar, and D. K. Panda, AcceleraAng Spark with RDMA for Big Data Processing: Early Experiences, Int'l Symposium on High Performance Interconnects (HotI'14), August 214 X. Lu, D. Shankar, S. Gugnani, and D. K. Panda, High- Performance Design of Apache Spark with RDMA and Its Benefits on Various Workloads, IEEE BigData 16, Dec Network Based CompuAng Laboratory HPBDC 17 Panel 23

24 HPBDC 17 Panel 24 Performance EvaluaAon on SDSC Comet SortBy/GroupBy Time (sec) IPoIB 8% RDMA Data Size (GB) Time (sec) 64 Worker Nodes, 1536 cores, SortByTest Total Time 64 Worker Nodes, 1536 cores, GroupByTest Total Time InfiniBand FDR, SSD, 64 Worker Nodes, 1536 Cores, (1536M 1536R) RDMA vs. IPoIB with 1536 concurrent tasks, single SSD per node. SortBy: Total Kme reduced by up to 8% over IPoIB (56Gbps) GroupBy: Total Kme reduced by up to 74% over IPoIB (56Gbps) IPoIB 74% RDMA Data Size (GB)

25 HPBDC 17 Panel 25 Performance EvaluaAon on SDSC Comet HiBench PageRank Time (sec) IPoIB 37% RDMA Huge BigData GiganKc Data Size (GB) Time (sec) 32 Worker Nodes, 768 cores, PageRank Total Time 64 Worker Nodes, 1536 cores, PageRank Total Time InfiniBand FDR, SSD, 32/64 Worker Nodes, 768/1536 Cores, (768/1536M 768/1536R) RDMA vs. IPoIB with 768/1536 concurrent tasks, single SSD per node. 32 nodes/768 cores: Total Kme reduced by 37% over IPoIB (56Gbps) 64 nodes/1536 cores: Total Kme reduced by 43% over IPoIB (56Gbps) IPoIB 43% RDMA Huge BigData GiganKc Data Size (GB)

26 HPBDC 17 Panel 26 EvaluaAon with BigDL on RDMA- Spark One Epoch Time (sec) IPoIB RDMA 4.58x Number of cores VGG training model on the CIFAR- 1 dataset Evaluated on SDSC Comet supercomputer IniKal Results: RDMA- based Spark outperforms default Spark over IPoIB by a factor of 4.58x

27 Design Overview of NVM and RDMA- aware HDFS (NVFS) Network Based CompuAng Laboratory HPBDC 17 Panel 27 ApplicaAons and Benchmarks Hadoop MapReduce Spark HBase RDMA Receiver RDMA Sender Co- Design (Cost- EffecAveness, Use- case) NVM and RDMA- aware HDFS (NVFS) DataNode RDMA Writer/Reader DFSClient Replicator RDMA Receiver NVFS - BlkIO NVM NVFS- MemIO SSD SSD SSD Design Features RDMA over NVM HDFS I/O with NVM Block Access Memory Access Hybrid design NVM with SSD as a hybrid storage for HDFS I/O Co- Design with Spark and HBase Cost- effeckveness Use- case N. S. Islam, M. W. Rahman, X. Lu, and D. K. Panda, High Performance Design for HDFS with Byte- Addressability of NVM and RDMA, 24th InternaAonal Conference on SupercompuAng (ICS), June 216

28 EvaluaAon with Hadoop MapReduce Average Throughput (MBps) 35 HDFS (56Gbps) 3 1.2x NVFS- BlkIO (56Gbps) 25 NVFS- MemIO (56Gbps) x 5 Write Read SDSC Comet (32 nodes) HDFS Network Based CompuAng Laboratory Average Throughput (MBps) TestDFSIO HDFS (56Gbps) 2x NVFS- BlkIO (56Gbps) NVFS- MemIO (56Gbps) 4x Write Read OSU Nowlab (4 nodes) TestDFSIO on SDSC Comet (32 nodes) TestDFSIO on OSU Nowlab (4 nodes) Write: NVFS- MemIO gains by 4x over HDFS Write: NVFS- MemIO gains by 4x over HDFS Read: NVFS- MemIO gains by 1.2x over Read: NVFS- MemIO gains by 2x over HDFS HPBDC 17 Panel 28

29 Overview of RDMA- Hadoop- Virt Architecture Hadoop Common Virtual Machines Big Data Applications CloudBurst MR-MS Polygraph Others Topology Detection Module MapReduce Map Task Scheduling Policy Extension YARN HDFS HBase Container Allocation Policy Extension Virtualization Aware Block Management Containers Others Bare-Metal nodes VirtualizaKon- aware modules in all the four main Hadoop components: HDFS: VirtualizaKon- aware Block Management to improve fault- tolerance YARN: Extensions to Container AllocaKon Policy to reduce network traffic MapReduce: Extensions to Map Task Scheduling Policy to reduce network traffic Hadoop Common: Topology DetecKon Module for automakc topology deteckon CommunicaKons in HDFS, MapReduce, and RPC go through RDMA- based designs over SR- IOV enabled InfiniBand S. Gugnani, X. Lu, D. K. Panda. Designing VirtualizaAon- aware and AutomaAc Topology DetecAon Schemes for AcceleraAng Hadoop on SR- IOV- enabled Clouds. CloudCom, 216. Network Based CompuAng Laboratory HPBDC 17 Panel 29

30 HPBDC 17 Panel 3 EvaluaAon with ApplicaAons CloudBurst Self- Join EXECUTION TIME RDMA- Hadoop Default Mode RDMA- Hadoop- Virt Distributed Mode 3% reduckon EXECUTION TIME RDMA- Hadoop Default Mode RDMA- Hadoop- Virt Distributed Mode 55% reduckon 14% and 24% improvement with Default Mode for CloudBurst and Self- Join 3% and 55% improvement with Distributed Mode for CloudBurst and Self- Join

31 Deep Learning: New Challenges for MPI RunAmes Deep Learning frameworks are a different game altogether Unusually large message sizes (order of megabytes) Most communicakon based on GPU buffers How to address these newer requirements? GPU- specific CommunicaKon Libraries (NCCL) NVidia's NCCL library provides inter- GPU communicakon 1 2 Internode Comm. (Knomial) 3 4 Intranode Comm. (NCCL Ring) PLX CPU Ring Direction PLX CUDA- Aware MPI (MVAPICH2- GDR) Provides support for GPU- based communicakon Can we exploit CUDA- Aware MPI and NCCL to support Deep Learning applicakons? Hierarchical CommunicaAon (Knomial + NCCL ring) Network Based CompuAng Laboratory HPBDC 17 Panel 31

32 Efficient Broadcast: MVAPICH2- GDR and NCCL NCCL has some limitakons Only works for a single node, thus, no scale- out on mulkple nodes DegradaKon across IOH (socket) for scale- up (within a node) We propose opkmized MPI_Bcast CommunicaKon of very large GPU buffers (order of megabytes) Scale- out on large number of dense mulk- GPU nodes Hierarchical CommunicaKon that efficiently exploits: CUDA- Aware MPI_Bcast in MV2- GDR 2 NCCL Broadcast primikve 1 Efficient Large Message Broadcast using NCCL and CUDA- Aware MPI for Deep Learning, A. Awan, K. Hamidouche, A. Venkatesh, and D. K. Panda, Number of GPUs The 23rd European MPI Users' Group MeeAng (EuroMPI 16), Sep 216 [Best Paper Runner- Up] Performance Benefits: Microso7 CNTK DL framework (25% avg. improvement ) Network Based CompuAng Laboratory HPBDC 17 Panel 32 Time (seconds) 3 Performance Benefits: OSU Micro- benchmarks MV2- GDR MV2- GDR- Opt 2.2X

33 Large Message OpAmized CollecAves for Deep Learning MV2- GDR provides opkmized colleckves for large message sizes OpKmized Reduce, Allreduce, and Bcast Good scaling with large number of GPUs Available with MVAPICH2- GDR 2.2GA Network Based CompuAng Laboratory Latency (ms) Latency (ms) Latency (ms) Reduce 192 GPUs Message Size (MB) Allreduce 64 GPUs Message Size (MB) Bcast 64 GPUs Latency (ms) Latency (ms) Latency (ms) 2 1 Message Size (MB) No. of GPUs HPBDC 17 Panel Reduce 64 MB No. of GPUs Allreduce MB No. of GPUs Bcast 128 MB

34 OSU- Caffe: Scalable Deep Learning Caffe : A flexible and layered Deep Learning framework. Benefits and Weaknesses MulK- GPU Training within a single node Performance degradakon for GPUs across different sockets Limited Scale- out OSU- Caffe: MPI- based Parallel Training Enable Scale- up (within a node) and Scale- out (across mulk- GPU nodes) network on ImageNet dataset A. A. Awan, K. Hamidouche, J. Hashmi, and D. K. Panda, S- Caffe: Co- designing MPI RunAmes and Caffe for Scalable Deep Learning on Modern GPU Clusters, PPoPP, Sep 217 OSU- Caffe is publicly available from: h<p://hidl.cse.ohio- state.edu Network Based CompuAng Laboratory Training Time (seconds) GoogLeNet (ImageNet) on 128 GPUs Invalid use case No. of GPUs Caffe OSU- Caffe (124) OSU- Caffe (248) HPBDC 17 Panel 34

35 HPBDC 17 Panel 35 Open Challenges in Designing CommunicaAon and I/O Middleware for High- Performance Big Data Processing High- Performance designs for Big Data middleware NVM- aware communicakon and I/O schemes for Big Data SATA- /PCIe- /NVMe- SSD support High- Bandwidth Memory support Threaded Models and SynchronizaKon Locality- aware designs Fault- tolerance/resiliency MigraKon support with virtual machines Data replicakon Efficient data access and placement policies Efficient task scheduling Fast deployment and automakc configurakons on Clouds OpKmizaKon for Deep Learning applicakons

36 HPBDC 17 Panel 36 Sunrise or Sunset of Big Data So7ware? Assuming 6: am as sunrise and 6: pm as sunset, We are at 8: am.

37 HPBDC 17 Panel 37 Thank You! state.edu hip:// state.edu/~panda Network- Based CompuKng Laboratory h<p://nowlab.cse.ohio- state.edu/ The High- Performance Big Data Project h<p://hibd.cse.ohio- state.edu/

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management SigHPC BigData BoF (SC 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters

High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters Presentation at Mellanox Theatre (SC 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

High Performance File System and I/O Middleware Design for Big Data on HPC Clusters

High Performance File System and I/O Middleware Design for Big Data on HPC Clusters High Performance File System and I/O Middleware Design for Big Data on HPC Clusters by Nusrat Sharmin Islam Advisor: Dhabaleswar K. (DK) Panda Department of Computer Science and Engineering The Ohio State

More information

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS

A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS Adithya Bhat, Nusrat Islam, Xiaoyi Lu, Md. Wasi- ur- Rahman, Dip: Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng

More information

High-Performance Training for Deep Learning and Computer Vision HPC

High-Performance Training for Deep Learning and Computer Vision HPC High-Performance Training for Deep Learning and Computer Vision HPC Panel at CVPR-ECV 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing

Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing Talk at Storage Developer Conference SNIA 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu

More information

High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters

High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters Presentation at Mellanox Theatre (SC 16) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing Talk at HPCAC Stanford Conference (Feb 18) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing

Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing Talk at HPCAC-Switzerland (April 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer

More information

Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning

Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning Exploiting HPC Technologies for Accelerating Big Data Processing and Associated Deep Learning Keynote Talk at Swiss Conference (April 18) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail:

More information

HPC Meets Big Data: Accelerating Hadoop, Spark, and Memcached with HPC Technologies

HPC Meets Big Data: Accelerating Hadoop, Spark, and Memcached with HPC Technologies HPC Meets Big Data: Accelerating Hadoop, Spark, and Memcached with HPC Technologies Talk at OpenFabrics Alliance Workshop (OFAW 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Exploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters

Exploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters Exploiting InfiniBand and Direct Technology for High Performance Collectives on Clusters Ching-Hsiang Chu chu.368@osu.edu Department of Computer Science and Engineering The Ohio State University OSU Booth

More information

DLoBD: An Emerging Paradigm of Deep Learning over Big Data Stacks on RDMA- enabled Clusters

DLoBD: An Emerging Paradigm of Deep Learning over Big Data Stacks on RDMA- enabled Clusters DLoBD: An Emerging Paradigm of Deep Learning over Big Data Stacks on RDMA- enabled Clusters Talk at OFA Workshop 2018 by Xiaoyi Lu The Ohio State University E- mail: luxi@cse.ohio- state.edu h=p://www.cse.ohio-

More information

Exploiting HPC Technologies for Accelerating Big Data Processing and Storage

Exploiting HPC Technologies for Accelerating Big Data Processing and Storage Exploiting HPC Technologies for Accelerating Big Data Processing and Storage Talk in the 5194 class by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi

More information

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K. In the multi-core age, How do larger, faster and cheaper and more responsive sub-systems affect data management? Panel at ADMS 211 Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory Department

More information

High-Performance Broadcast for Streaming and Deep Learning

High-Performance Broadcast for Streaming and Deep Learning High-Performance Broadcast for Streaming and Deep Learning Ching-Hsiang Chu chu.368@osu.edu Department of Computer Science and Engineering The Ohio State University OSU Booth - SC17 2 Outline Introduction

More information

Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning

Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning Ammar Ahmad Awan, Khaled Hamidouche, Akshay Venkatesh, and Dhabaleswar K. Panda Network-Based Computing Laboratory Department

More information

Overview of High- Performance Big Data Stacks

Overview of High- Performance Big Data Stacks Overview of High- Performance Big Data Stacks OSU Booth Talk at SC 17 by Xiaoyi Lu The Ohio State University E- mail: luxi@cse.ohio- state.edu h=p://www.cse.ohio- state.edu/~luxi OSC Booth @ SC 16 2 Overview

More information

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning

Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning Efficient and Scalable Multi-Source Streaming Broadcast on Clusters for Deep Learning Ching-Hsiang Chu 1, Xiaoyi Lu 1, Ammar A. Awan 1, Hari Subramoni 1, Jahanzeb Hashmi 1, Bracy Elton 2 and Dhabaleswar

More information

Big Data Analytics with the OSU HiBD Stack at SDSC. Mahidhar Tatineni OSU Booth Talk, SC18, Dallas

Big Data Analytics with the OSU HiBD Stack at SDSC. Mahidhar Tatineni OSU Booth Talk, SC18, Dallas Big Data Analytics with the OSU HiBD Stack at SDSC Mahidhar Tatineni OSU Booth Talk, SC18, Dallas Comet HPC for the long tail of science iphone panorama photograph of 1 of 2 server rows Comet: System Characteristics

More information

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies

Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache

More information

Accelerate Big Data Processing (Hadoop, Spark, Memcached, & TensorFlow) with HPC Technologies

Accelerate Big Data Processing (Hadoop, Spark, Memcached, & TensorFlow) with HPC Technologies Accelerate Big Data Processing (Hadoop, Spark, Memcached, & TensorFlow) with HPC Technologies Talk at Intel HPC Developer Conference 2017 (SC 17) by Dhabaleswar K. (DK) Panda The Ohio State University

More information

High Performance Design for HDFS with Byte-Addressability of NVM and RDMA

High Performance Design for HDFS with Byte-Addressability of NVM and RDMA High Performance Design for HDFS with Byte-Addressability of and RDMA Nusrat Sharmin Islam, Md Wasi-ur-Rahman, Xiaoyi Lu, and Dhabaleswar K (DK) Panda Department of Computer Science and Engineering The

More information

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning

Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning 5th ANNUAL WORKSHOP 209 Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning Hari Subramoni Dhabaleswar K. (DK) Panda The Ohio State University The Ohio State University E-mail:

More information

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase

Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase 2 IEEE 8th International Conference on Cloud Computing Technology and Science Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase Xiaoyi Lu, Dipti Shankar, Shashank Gugnani,

More information

Accelerating and Benchmarking Big Data Processing on Modern Clusters

Accelerating and Benchmarking Big Data Processing on Modern Clusters Accelerating and Benchmarking Big Data Processing on Modern Clusters Keynote Talk at BPOE-6 (Sept 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached

Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached Talk at HPC Advisory Council Lugano Conference (213) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Designing HPC, Big Data, and Deep Learning Middleware for Exascale Systems: Challenges and Opportunities

Designing HPC, Big Data, and Deep Learning Middleware for Exascale Systems: Challenges and Opportunities Designing HPC, Big Data, and Deep Learning Middleware for Exascale Systems: Challenges and Opportunities DHABALESWAR K PANDA DK Panda is a Professor and University Distinguished Scholar of Computer Science

More information

Exploi'ng HPC Technologies to Accelerate Big Data Processing

Exploi'ng HPC Technologies to Accelerate Big Data Processing Exploi'ng HPC Technologies to Accelerate Big Data Processing Talk at Open Fabrics Workshop (April 216) by Dhabaleswar K. (DK) Panda The Ohio State University E- mail: panda@cse.ohio- state.edu h

More information

Accelerating and Benchmarking Big Data Processing on Modern Clusters

Accelerating and Benchmarking Big Data Processing on Modern Clusters Accelerating and Benchmarking Big Data Processing on Modern Clusters Open RG Big Data Webinar (Sept 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures

Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Characterizing and Benchmarking Deep Learning Systems on Modern Data Center Architectures Talk at Bench 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu http://www.cse.ohio-state.edu/~luxi

More information

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience

SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory

More information

Accelerating Big Data Processing with RDMA- Enhanced Apache Hadoop

Accelerating Big Data Processing with RDMA- Enhanced Apache Hadoop Accelerating Big Data Processing with RDMA- Enhanced Apache Hadoop Keynote Talk at BPOE-4, in conjunction with ASPLOS 14 (March 2014) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects

Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects Keynote Talk at ADMS 214 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters

Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters Designing High Performance Heterogeneous Broadcast for Streaming Applications on Clusters 1 Ching-Hsiang Chu, 1 Khaled Hamidouche, 1 Hari Subramoni, 1 Akshay Venkatesh, 2 Bracy Elton and 1 Dhabaleswar

More information

High-Performance MPI Library with SR-IOV and SLURM for Virtualized InfiniBand Clusters

High-Performance MPI Library with SR-IOV and SLURM for Virtualized InfiniBand Clusters High-Performance MPI Library with SR-IOV and SLURM for Virtualized InfiniBand Clusters Talk at OpenFabrics Workshop (April 2016) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage*

Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage* 216 IEEE International Conference on Big Data (Big Data) Efficient Data Access Strategies for Hadoop and Spark on HPC Cluster with Heterogeneous Storage* Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi

More information

RDMA for Memcached User Guide

RDMA for Memcached User Guide 0.9.5 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)

More information

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio

More information

Acceleration for Big Data, Hadoop and Memcached

Acceleration for Big Data, Hadoop and Memcached Acceleration for Big Data, Hadoop and Memcached A Presentation at HPC Advisory Council Workshop, Lugano 212 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR

Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication

More information

Overview of MVAPICH2 and MVAPICH2- X: Latest Status and Future Roadmap

Overview of MVAPICH2 and MVAPICH2- X: Latest Status and Future Roadmap Overview of MVAPICH2 and MVAPICH2- X: Latest Status and Future Roadmap MVAPICH2 User Group (MUG) MeeKng by Dhabaleswar K. (DK) Panda The Ohio State University E- mail: panda@cse.ohio- state.edu h

More information

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth

Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth by D.K. Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline Overview of MVAPICH2-GPU

More information

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand

Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Memcached Design on High Performance RDMA Capable Interconnects

Memcached Design on High Performance RDMA Capable Interconnects Memcached Design on High Performance RDMA Capable Interconnects Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi- ur- Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan

More information

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters

Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

GPU- Aware Design, Implementation, and Evaluation of Non- blocking Collective Benchmarks

GPU- Aware Design, Implementation, and Evaluation of Non- blocking Collective Benchmarks GPU- Aware Design, Implementation, and Evaluation of Non- blocking Collective Benchmarks Presented By : Esthela Gallardo Ammar Ahmad Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan Perkins, Hari Subramoni,

More information

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters

Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi

More information

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures

Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures M. Bayatpour, S. Chakraborty, H. Subramoni, X. Lu, and D. K. Panda Department of Computer Science and Engineering

More information

MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning

MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning Talk at Mellanox Theater (SC 16) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Designing HPC, Big Data, Deep Learning, and Cloud Middleware for Exascale Systems: Challenges and Opportunities

Designing HPC, Big Data, Deep Learning, and Cloud Middleware for Exascale Systems: Challenges and Opportunities Designing HPC, Big Data, Deep Learning, and Cloud Middleware for Exascale Systems: Challenges and Opportunities Keynote Talk at SEA Symposium (April 18) by Dhabaleswar K. (DK) Panda The Ohio State University

More information

CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters

CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan and Dhabaleswar K. (DK) Panda Speaker: Sourav Chakraborty

More information

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING

Chelsio Communications. Meeting Today s Datacenter Challenges. Produced by Tabor Custom Publishing in conjunction with: CUSTOM PUBLISHING Meeting Today s Datacenter Challenges Produced by Tabor Custom Publishing in conjunction with: 1 Introduction In this era of Big Data, today s HPC systems are faced with unprecedented growth in the complexity

More information

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09

Study. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09 RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement

More information

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context

Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context 1 Apache Spark is a fast and general-purpose engine for large-scale data processing Spark aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes

More information

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO FS Albis Streaming

More information

Designing High Performance Communication Middleware with Emerging Multi-core Architectures

Designing High Performance Communication Middleware with Emerging Multi-core Architectures Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Deep Learning Frameworks with Spark and GPUs

Deep Learning Frameworks with Spark and GPUs Deep Learning Frameworks with Spark and GPUs Abstract Spark is a powerful, scalable, real-time data analytics engine that is fast becoming the de facto hub for data science and big data. However, in parallel,

More information

Apache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science

Apache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science Apache Spark 2 X Cookbook Cloud Ready Recipes For Analytics And Data Science We have made it easy for you to find a PDF Ebooks without any digging. And by having access to our ebooks online or by storing

More information

Designing Efficient HPC, Big Data, Deep Learning, and Cloud Computing Middleware for Exascale Systems

Designing Efficient HPC, Big Data, Deep Learning, and Cloud Computing Middleware for Exascale Systems Designing Efficient HPC, Big Data, Deep Learning, and Cloud Computing Middleware for Exascale Systems Keynote Talk at HPCAC China Conference by Dhabaleswar K. (DK) Panda The Ohio State University E mail:

More information

Application Acceleration Beyond Flash Storage

Application Acceleration Beyond Flash Storage Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage

More information

Building the Most Efficient Machine Learning System

Building the Most Efficient Machine Learning System Building the Most Efficient Machine Learning System Mellanox The Artificial Intelligence Interconnect Company June 2017 Mellanox Overview Company Headquarters Yokneam, Israel Sunnyvale, California Worldwide

More information

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters

Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of

More information

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries

How to Boost the Performance of Your MPI and PGAS Applications with MVAPICH2 Libraries How to Boost the Performance of Your MPI and PGAS s with MVAPICH2 Libraries A Tutorial at the MVAPICH User Group (MUG) Meeting 18 by The MVAPICH Team The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Bridging Neuroscience and HPC with MPI-LiFE Shashank Gugnani

Bridging Neuroscience and HPC with MPI-LiFE Shashank Gugnani Bridging Neuroscience and HPC with MPI-LiFE Shashank Gugnani The Ohio State University E-mail: gugnani.2@osu.edu http://web.cse.ohio-state.edu/~gugnani/ Network Based Computing Laboratory SC 17 2 Neuroscience:

More information

Designing Virtualization-aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-enabled Clouds

Designing Virtualization-aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-enabled Clouds 216 IEEE 8th International Conference on Cloud Computing Technology and Science Designing Virtualization-aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-enabled Clouds

More information

Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters

Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters Designing and Building Efficient HPC Cloud with Modern Networking Technologies on Heterogeneous HPC Clusters Jie Zhang Dr. Dhabaleswar K. Panda (Advisor) Department of Computer Science & Engineering The

More information

Comet Virtualization Code & Design Sprint

Comet Virtualization Code & Design Sprint Comet Virtualization Code & Design Sprint SDSC September 23-24 Rick Wagner San Diego Supercomputer Center Meeting Goals Build personal connections between the IU and SDSC members of the Comet team working

More information

Deep Learning mit PowerAI - Ein Überblick

Deep Learning mit PowerAI - Ein Überblick Stephen Lutz Deep Learning mit PowerAI - Open Group Master Certified IT Specialist Technical Sales IBM Cognitive Infrastructure IBM Germany Ein Überblick Stephen.Lutz@de.ibm.com What s that? and what s

More information

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research

Data Processing at the Speed of 100 Gbps using Apache Crail. Patrick Stuedi IBM Research Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research The CRAIL Project: Overview Data Processing Framework (e.g., Spark, TensorFlow, λ Compute) Spark-IO Albis Pocket

More information

RDMA for Apache Hadoop 2.x User Guide

RDMA for Apache Hadoop 2.x User Guide 1.3.0 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)

More information

Accelerate Big Data Insights

Accelerate Big Data Insights Accelerate Big Data Insights Executive Summary An abundance of information isn t always helpful when time is of the essence. In the world of big data, the ability to accelerate time-to-insight can not

More information

Sharing High-Performance Devices Across Multiple Virtual Machines

Sharing High-Performance Devices Across Multiple Virtual Machines Sharing High-Performance Devices Across Multiple Virtual Machines Preamble What does sharing devices across multiple virtual machines in our title mean? How is it different from virtual networking / NSX,

More information

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures

EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures Haiyang Shi, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda {shi.876, lu.932, panda.2}@osu.edu The Ohio State University

More information

Interconnect Your Future

Interconnect Your Future Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators

More information

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved

Hadoop 2.x Core: YARN, Tez, and Spark. Hortonworks Inc All Rights Reserved Hadoop 2.x Core: YARN, Tez, and Spark YARN Hadoop Machine Types top-of-rack switches core switch client machines have client-side software used to access a cluster to process data master nodes run Hadoop

More information

Coupling GPUDirect RDMA and InfiniBand Hardware Multicast Technologies for Streaming Applications

Coupling GPUDirect RDMA and InfiniBand Hardware Multicast Technologies for Streaming Applications Coupling GPUDirect RDMA and InfiniBand Hardware Multicast Technologies for Streaming Applications GPU Technology Conference GTC 2016 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu

More information

Infiniband and RDMA Technology. Doug Ledford

Infiniband and RDMA Technology. Doug Ledford Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic

More information

Accelerating HPL on Heterogeneous GPU Clusters

Accelerating HPL on Heterogeneous GPU Clusters Accelerating HPL on Heterogeneous GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline

More information

Performance of Applications on Comet GPU Nodes Utilizing MVAPICH2-GDR. Mahidhar Tatineni MVAPICH User Group Meeting August 16, 2017

Performance of Applications on Comet GPU Nodes Utilizing MVAPICH2-GDR. Mahidhar Tatineni MVAPICH User Group Meeting August 16, 2017 Performance of Applications on Comet GPU Nodes Utilizing MVAPICH2-GDR Mahidhar Tatineni MVAPICH User Group Meeting August 16, 2017 This work supported by the National Science Foundation, award ACI-1341698.

More information

Scaling with PGAS Languages

Scaling with PGAS Languages Scaling with PGAS Languages Panel Presentation at OFA Developers Workshop (2013) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Communication Frameworks for HPC and Big Data

Communication Frameworks for HPC and Big Data Communication Frameworks for HPC and Big Data Talk at HPC Advisory Council Spain Conference (215) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems

Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems Designing and Modeling High-Performance MapReduce and DAG Execution Framework on Modern HPC Systems Dissertation Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

More information

MVAPICH2 Project Update and Big Data Acceleration

MVAPICH2 Project Update and Big Data Acceleration MVAPICH2 Project Update and Big Data Acceleration Presentation at HPC Advisory Council European Conference 212 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

Networking Technologies and Middleware for Next-Generation Clusters and Data Centers: Challenges and Opportunities

Networking Technologies and Middleware for Next-Generation Clusters and Data Centers: Challenges and Opportunities Networking Technologies and Middleware for Next-Generation Clusters and Data Centers: Challenges and Opportunities Keynote Talk at HPSR 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail:

More information

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet

Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test

More information

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries

Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big

More information

High Performance Migration Framework for MPI Applications on HPC Cloud

High Performance Migration Framework for MPI Applications on HPC Cloud High Performance Migration Framework for MPI Applications on HPC Cloud Jie Zhang, Xiaoyi Lu and Dhabaleswar K. Panda {zhanjie, luxi, panda}@cse.ohio-state.edu Computer Science & Engineering Department,

More information

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE

BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BIG DATA AND HADOOP ON THE ZFS STORAGE APPLIANCE BRETT WENINGER, MANAGING DIRECTOR 10/21/2014 ADURANT APPROACH TO BIG DATA Align to Un/Semi-structured Data Instead of Big Scale out will become Big Greatest

More information

MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning

MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning Talk at Mellanox booth (SC 218) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning

MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning MVAPICH2-GDR: Pushing the Frontier of HPC and Deep Learning GPU Technology Conference GTC 217 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda

More information

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab

2/26/2017. Originally developed at the University of California - Berkeley's AMPLab Apache is a fast and general engine for large-scale data processing aims at achieving the following goals in the Big data context Generality: diverse workloads, operators, job sizes Low latency: sub-second

More information

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems

S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems S8765 Performance Optimization for Deep- Learning on the Latest POWER Systems Khoa Huynh Senior Technical Staff Member (STSM), IBM Jonathan Samn Software Engineer, IBM Evolving from compute systems to

More information

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation

Analytic Cloud with. Shelly Garion. IBM Research -- Haifa IBM Corporation Analytic Cloud with Shelly Garion IBM Research -- Haifa 2014 IBM Corporation Why Spark? Apache Spark is a fast and general open-source cluster computing engine for big data processing Speed: Spark is capable

More information

Mapping MPI+X Applications to Multi-GPU Architectures

Mapping MPI+X Applications to Multi-GPU Architectures Mapping MPI+X Applications to Multi-GPU Architectures A Performance-Portable Approach Edgar A. León Computer Scientist San Jose, CA March 28, 2018 GPU Technology Conference This work was performed under

More information

Spark Overview. Professor Sasu Tarkoma.

Spark Overview. Professor Sasu Tarkoma. Spark Overview 2015 Professor Sasu Tarkoma www.cs.helsinki.fi Apache Spark Spark is a general-purpose computing framework for iterative tasks API is provided for Java, Scala and Python The model is based

More information

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science

Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science Gateways to Discovery: Cyberinfrastructure for the Long Tail of Science ECSS Symposium, 12/16/14 M. L. Norman, R. L. Moore, D. Baxter, G. Fox (Indiana U), A Majumdar, P Papadopoulos, W Pfeiffer, R. S.

More information

Democratizing Machine Learning on Kubernetes

Democratizing Machine Learning on Kubernetes Democratizing Machine Learning on Kubernetes Joy Qiao, Senior Solution Architect - AI and Research Group, Microsoft Lachlan Evenson - Principal Program Manager AKS/ACS, Microsoft Who are we? The Data Scientist

More information