Acceleration for Big Data, Hadoop and Memcached
|
|
- Audra Hawkins
- 5 years ago
- Views:
Transcription
1 Acceleration for Big Data, Hadoop and Memcached A Presentation at HPC Advisory Council Workshop, Lugano 212 by Dhabaleswar K. (DK) Panda The Ohio State University panda@cse.ohio-state.edu
2 Recap of Last Two Day s Presentations MPI is a dominant programming model for HPC Systems Introduced some of the MPI Features and their Usage Introduced MVAPICH2 stack Illustrated many performance optimizations and tuning techniques for MVAPICH2 Provided an overview of MPI-3 Features Introduced challenges in designing MPI for Exascale systems Presented approaches being taken by MVAPICH2 for Exascale systems 2
3 High-Performance Networks in the Top5 Percentage share of InfiniBand is steadily increasing 3
4 Use of High-Performance Networks for Scientific Computing OpenFabrics software stack with IB, iwarp and RoCE interfaces are driving HPC systems Message Passing Interface (MPI) Parallel File Systems Almost 11.5 years of Research and Development since InfiniBand was introduced in October 21 Other Programming Models are emerging to take advantage of High-Performance Networks UPC SHMEM 4
5 Latency (us) Latency (us) One-way Latency: MPI over IB Small Message Latency Large Message Latency MVAPICH-Qlogic-DDR MVAPICH-Qlogic-QDR MVAPICH-ConnectX-DDR MVAPICH-ConnectX2-PCIe2-QDR MVAPICH-ConnectX3-PCIe3-FDR Message Size (bytes) Message Size (bytes) DDR, QDR GHz Quad-core (Westmere) Intel PCI Gen2 with IB switch FDR GHz Octa-core (Sandybridge) Intel PCI Gen3 without IB switch 5
6 Bandwidth (MBytes/sec) Bandwidth (MBytes/sec) Bandwidth: MPI over IB 7 Unidirectional Bandwidth Bidirectional Bandwidth 15 MVAPICH-Qlogic-DDR MVAPICH-Qlogic-QDR MVAPICH-ConnectX-DDR MVAPICH-ConnectX2-PCIe2-QDR MVAPICH-ConnectX3-PCIe3-FDR Message Size (bytes) Message Size (bytes) DDR, QDR GHz Quad-core (Westmere) Intel PCI Gen2 with IB switch FDR GHz Octa-core (Sandybridge) Intel PCI Gen3 without IB switch 6
7 Large-scale InfiniBand Installations 29 IB Clusters (41.8%) in the November 11 Top5 list ( Installations in the Top 3 (13 systems): 12,64 cores (Nebulae) in China (4 th ) 29,44 cores (Mole-8.5) in China (21 st ) 73,278 cores (Tsubame-2.) in Japan (5 th ) 42,44 cores (Red Sky) at Sandia (24 th ) 111,14 cores (Pleiades) at NASA Ames (7 th ) 62,976 cores (Ranger) at TACC (25 th ) 138,368 cores (Tera-1) at France (9 th ) 2,48 cores (Bull Benchmarks) in France (27 th ) 122,4 cores (RoadRunner) at LANL (1 th ) 2,48 cores (Helios) in Japan (28 th ) 137,2 cores (Sunway Blue Light) in China (14 th ) More are getting installed! 46,28 cores (Zin) at LLNL (15 th ) 33,72 cores (Lomonosov) in Russia (18 th ) 7
8 Enterprise/Commercial Computing Focuses on big data and data analytics Multiple environments and middleware are gaining momentum Hadoop (HDFS, HBase and MapReduce) Memcached 8
9 Can High-Performance Interconnects Benefit Enterprise Computing? Most of the current enterprise systems use 1GE Concerns for performance and scalability Usage of High-Performance Networks is beginning to draw interest Oracle, IBM, Google are working along these directions What are the challenges? Where do the bottlenecks lie? Can these bottlenecks be alleviated with new designs (similar to the designs adopted for MPI)? 9
10 Presentation Outline Overview of Hadoop, Memcached and HBase Challenges in Accelerating Enterprise Middleware Designs and Case Studies Memcached HBase HDFS Conclusion and Q&A 1
11 Memcached Architecture Main memory CPUs Main memory CPUs SSD HDD... SSD HDD!"#$%"$#& High Performance Networks High Performance Networks Main memory SSD CPUs HDD High Performance Networks Main memory SSD CPUs HDD (Database Servers) Main memory CPUs Web Frontend Servers (Memcached Clients) (Memcached Servers) SSD HDD Integral part of Web 2. architecture Distributed Caching Layer Allows to aggregate spare memory from multiple nodes General purpose Typically used to cache database queries, results of API calls Scalable model, but typical usage very network intensive 11
12 Hadoop Architecture Underlying Hadoop Distributed File System (HDFS) Fault-tolerance by replicating data blocks NameNode: stores information on data blocks DataNodes: store blocks and host Map-reduce computation JobTracker: track jobs and detect failure Model scales but high amount of communication during intermediate phases 12
13 Network-Level Interaction Between Clients and Data Nodes in HDFS (HDD/SSD) High Performance Networks (HDD/SSD) (HDD/SSD) (HDFS Clients) (HDFS Data Nodes) 13
14 Overview of HBase Architecture An open source database project based on Hadoop framework for hosting very large tables Major components: HBaseMaster, HRegionServer and HBaseClient HBase and HDFS are deployed in the same cluster to get better data locality 14
15 Network-Level Interaction Between HBase Clients, Region Servers and Data Nodes (HDD/SSD) High Performance Networks... High Performance Networks... (HDD/SSD) (HDD/SSD) (HBase Clients) (HRegion Servers) (Data Nodes) 15
16 Presentation Outline Overview of Hadoop, Memcached and HBase Challenges in Accelerating Enterprise Middleware Designs and Case Studies Memcached HBase HDFS Conclusion and Q&A 16
17 Designing Communication and I/O Libraries for Enterprise Systems: Challenges Appl i c at i on s D at ac e n t e r M i ddl e war e (H D F S, H Ba s e, M a pre duc e, M e m c a c he d) P r ogr am m i n g M ode l s (S oc ke t ) Com m u n i c at i on an d I / O L i br ar y P oi nt -t o-p oi nt Com m uni c a t i on T hre a di ng M ode l s a nd S ync hroni z a t i on I/ O a nd F i l e s ys t e m s Q os F a ul t T ol e ra nc e N e t w orki ng T e c hnol ogi e s (Infi ni Ba nd, 1/ 1/ 4 G i G E, RN ICs & Int e l l i ge nt N ICs ) Com m odi t y Com put i ng S ys t e m A rc hi t e c t ure s (s i ngl e, dua l, qua d,..) M ul t i / M a ny-c ore A rc hi t e c t ure a nd A c c e l e ra t ors S t ora ge T e c hnol ogi e s (H D D or S S D ) 17
18 Common Protocols using Open Fabrics Application Application Interface Sockets Verbs Protocol mplementation Kernel Space Ethernet Driver TCP/IP IPoIB TCP/IP Hardware Offload SDP RDMA iwarp User space RDMA User space RDMA User space Network Adapter Ethernet Adapter InfiniBand Adapter Ethernet Adapter InfiniBand Adapter iwarp Adapter RoCE Adapter InfiniBand Adapter Network Switch Ethernet Switch InfiniBand Switch Ethernet Switch InfiniBand Switch Ethernet Switch Ethernet Switch InfiniBand Switch 1/1/4 GigE IPoIB 1/4 GigE- TOE SDP iwarp RoCE IB Verbs 18
19 Can New Data Analysis and Management Systems be Designed with High-Performance Networks and Protocols? Current Design Enhanced Designs Our Approach Application Application Application Sockets Accelerated Sockets Verbs / Hardware Offload OSU Design Verbs Interface 1/1 GigE Network 1 GigE or InfiniBand 1 GigE or InfiniBand Sockets not designed for high-performance Stream semantics often mismatch for upper layers (Memcached, HBase, Hadoop) Zero-copy not available for non-blocking sockets 19
20 Interplay between Storage and Interconnect/Protocols Most of the current generation enterprise systems use the traditional hard disks Since hard disks are slower, high performance communication protocols may not have impact SSDs and other storage technologies are emerging Does it change the landscape? 2
21 Presentation Outline Overview of Hadoop, Memcached and HBase Challenges in Accelerating Enterprise Middleware Designs and Case Studies Memcached HBase HDFS Conclusion and Q&A 21
22 Memcached Design Using Verbs Sockets Client RDMA Client Master Thread Sockets Worker Thread Verbs Worker Thread Sockets Worker Thread Verbs Worker Thread Shared Data Memory Slabs Items Server and client perform a negotiation protocol Master thread assigns clients to appropriate worker thread Once a client is assigned a verbs worker thread, it can communicate directly and is bound to that thread, each verbs worker thread can support multiple clients All other Memcached data structures are shared among RDMA and Sockets worker threads Memcached applications need not be modified; uses verbs interface if available Memcached Server can serve both socket and verbs clients simultaneously 22
23 Experimental Setup Hardware Intel Clovertown Each node has 8 processor cores on 2 Intel Xeon 2.33 GHz Quad-core CPUs, 6 GB main memory, 25 GB hard disk Network: 1GigE, IPoIB, 1GigE TOE and IB (DDR) Intel Westmere Software Each node has 8 processor cores on 2 Intel Xeon 2.67 GHz Quad-core CPUs, 12 GB main memory, 16 GB hard disk Network: 1GigE, IPoIB, and IB (QDR) Memcached Server: Memcached Client: (libmemcached).52 In all experiments, memtable is contained in memory (no disk access involved) 23
24 Time (us) Time (us) Memcached Get Latency (Small Message) SDP OSU-RC-IB 1GigE IPoIB 1GigE OSU-UD-IB K 2K K 2K Message Size Message Size Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR) Memcached Get latency 4 bytes RC/UD DDR: 6.82/7.55 us; QDR: 4.28/4.86 us 2K bytes RC/UD DDR: 12.31/12.78 us; QDR: 8.19/8.46 us Almost factor of four improvement over 1GE (TOE) for 2K bytes on the DDR cluster 24
25 Time (us) Time (us) Memcached Get Latency (Large Message) SDP OSU-RC-IB IPoIB 1GigE 4 4 1GigE OSU-UD-IB K 4K 8K 16K 32K 64K 128K 256K 512K 2K 4K 8K 16K 32K 64K 128K 256K 512K Message Size Message Size Intel Clovertown Cluster (IB: DDR) Intel Westmere Cluster (IB: QDR) Memcached Get latency 8K bytes RC/UD DDR: 18.9/19.1 us; QDR: 11.8/12.2 us 512K bytes RC/UD -- DDR: 369/43 us; QDR: 173/23 us Almost factor of two improvement over 1GE (TOE) for 512K bytes on the DDR cluster 25
26 Thousands of Transactions per second (TPS) Thousands of Transactions per second (TPS) Memcached Get TPS (4byte) SDP OSU-RC-IB OSU-UD-IB IPoIB 1GigE K 4 8 No. of Clients No. of Clients Memcached Get transactions per second for 4 bytes On IB QDR 1.4M/s (RC), 1.3 M/s (UD) for 8 clients Significant improvement with native IB QDR compared to SDP and IPoIB 26
27 Memory Footprint (MB) Memcached - Memory Scalability SDP OSU-RC-IB OSU-UD-IB IPoIB 1GigE OSU-Hybrid-IB 2 1 Steady Memory Footprint for UD Design ~ 2MB K 1.6K 2K 4K RC Memory Footprint increases as increase in number of clients ~5MB for 4K clients No. of Clients 27
28 Time (ms) Time (ms) Application Level Evaluation Olio Benchmark SDP 1 IPoIB OSU-RC-IB OSU-UD-IB OSU-Hybrid-IB No. of Clients No. of Clients Olio Benchmark RC 1.6 sec, UD 1.9 sec, Hybrid 1.7 sec for 124 clients 4X times better than IPoIB for 8 clients Hybrid design achieves comparable performance to that of pure RC design 28
29 Time (ms) Time (ms) Application Level Evaluation Real Application Workloads SDP IPoIB OSU-RC-IB OSU-UD-IB OSU-Hybrid-IB No. of Clients Real Application Workload RC 32 ms, UD 318 ms, Hybrid 314 ms for 124 clients 12X times better than IPoIB for 8 clients Hybrid design achieves comparable performance to that of pure RC design No. of Clients J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, W. Rahman, N. Islam, X. Ouyang, H. Wang, S. Sur and D. K. Panda, Memcached Design on High Performance RDMA Capable Interconnects, ICPP 11 J. Jose, H. Subramoni, K. Kandalla, W. Rahman, H. Wang, S. Narravula, and D. K. Panda, Memcached Design on High Performance RDMA Capable Interconnects, CCGrid 12 29
30 Presentation Outline Overview of Hadoop, Memcached and HBase Challenges in Accelerating Enterprise Middleware Designs and Case Studies Memcached HBase HDFS Conclusion and Q&A 3
31 HBase Design Using Verbs Current Design OSU Design HBase HBase JNI Interface Sockets OSU Module 1/1 GigE Network InfiniBand (Verbs) 31
32 Experimental Setup Hardware Intel Clovertown Each node has 8 processor cores on 2 Intel Xeon 2.33 GHz Quad-core CPUs, 6 GB main memory, 25 GB hard disk Network: 1GigE, IPoIB, 1GigE TOE and IB (DDR) Intel Westmere Each node has 8 processor cores on 2 Intel Xeon 2.67 GHz Quad-core CPUs, 12 GB main memory, 16 GB hard disk Network: 1GigE, IPoIB, and IB (QDR) 3 Nodes used Software Node1 [NameNode & HBase Master] Node2 [DataNode & HBase RegionServer] Node3 [Client] Hadoop.2., HBase.9.3 and Sun Java SDK 1.7. In all experiments, memtable is contained in memory (no disk access involved) 32
33 Details on Experiments Key/Value size Key size: 2 Bytes Value size: 1KB/4KB Get operation One Key/Value pair is inserted, so that Key/Value pair will stay in memory Get operation is repeated 8, times Skipped first 4, iterations as warm-up Put operation Memstore_Flush_Size is set to be 256 MB No memory flush operation involved Put operation is repeated 4, times Skipped first 1, iterations as warm-up 33
34 Time (us) Operations /sec Get Operation (IB:DDR) Latency Throughput GE 1GE IPoIB OSU Design K 4K Message Size HBase Get Operation 1K 1K bytes 65 us (15K TPS) 4K bytes us (11K TPS) Almost factor of two improvement over 1GE (TOE) Message Size 4K 34
35 Time (us) Operations /sec Get Operation (IB:QDR) Latency Throughput GE IPoIB OSU Design K Message Size 4K 1K Message Size 4K HBase Get Operation 1K bytes 47 us (22K TPS) 4K bytes us (16K TPS) Almost factor of four improvement over IPoIB for 1KB 35
36 Time (us) Operations /sec Put Operation (IB:DDR) Latency Throughput GE IPoIB 1GE OSU Design K Message Size 4K 1K Message Size 4K HBase Put Operation 1K bytes 114 us (8.7K TPS) 4K bytes us (5.6K TPS) 34% improvement over 1GE (TOE) for 1KB 36
37 Time (us) Operations /sec Put Operation (IB:QDR) 4 1GE Latency 14 Throughput 35 IPoIB 12 3 OSU Design K Message Size 4K 1K Message Size 4K HBase Put Operation 1K bytes 78 us (13K TPS) 4K bytes us (8K TPS) A factor of two improvement over IPoIB for 1KB 37
38 Time (us) Time (us) HBase Put/Get Detailed Analysis Communication Communication Preparation Server Processing Server Serialization Client Processing Client Serialization 5 5 1GigE IPoIB 1GigE OSU-IB HBase Put 1KB HBase 1KB Put Communication Time 8.9 us A factor of 6X improvement over 1GE for communication time HBase 1KB Get Communication Time 8.9 us A factor of 6X improvement over 1GE for communication time 1GigE IPoIB 1GigE OSU-IB HBase Get 1KB W. Rahman, J. Huang, J. Jose, X. Ouyang, H. Wang, N. Islam, H. Subramoni, Chet Murthy and D. K. Panda, Understanding the Communication Characteristics in HBase: What are the Fundamental Bottlenecks?, ISPASS 12 38
39 Time (us) Ops/sec HBase Single Server-Multi-Client Results IPoIB OSU-IB 4 4 1GigE 3 3 1GigE No. of Clients No. of Clients Latency Throughput HBase Get latency 4 clients: 14.5 us; 16 clients: us HBase Get throughput 4 clients: 37.1 Kops/sec; 16 clients: 53.4 Kops/sec 27% improvement in throughput for 16 clients over 1GE 39
40 Time (us) Time (us) HBase YCSB Read-Write Workload IPoIB 1GigE OSU-IB 1GigE No. of Clients No. of Clients Read Latency HBase Get latency (Yahoo! Cloud Service Benchmark) 64 clients: 2. ms; 128 Clients: 3.5 ms 42% improvement over IPoIB for 128 clients HBase Get latency 64 clients: 1.9 ms; 128 Clients: 3.5 ms 4% improvement over IPoIB for 128 clients Write Latency J. Huang, X. Ouyang, J. Jose, W. Rahman, H. Wang, M. Luo, H. Subramoni, Chet Murthy and D. K. Panda, High- Performance Design of HBase with RDMA over InfiniBand, IPDPS 12 4
41 Presentation Outline Overview of Hadoop, Memcached and HBase Challenges in Accelerating Enterprise Middleware Designs and Case Studies Memcached HBase HDFS Conclusion and Q&A 41
42 Studies and Experimental Setup Two Kinds of Designs and Studies we have Done Studying the impact of HDD vs. SSD for HDFS Unmodified Hadoop for experiments Preliminary design of HDFS over Verbs Hadoop Experiments Intel Clovertown 2.33GHz, 6GB RAM, InfiniBand DDR, Chelsio T32 Intel X-25E 64GB SSD and 25GB HDD Hadoop version.2.2, Sun/Oracle Java 1.6. Dedicated NameServer and JobTracker Number of Datanodes used: 2, 4, and 8 42
43 Average Write Throughput (MB/sec) Hadoop: DFS IO Write Performance Four Data Nodes Using HDD and SSD 1GE with HDD IGE with SSD IPoIB with HDD IPoIB with SSD SDP with HDD SDP with SSD 1GE-TOE with HDD 1GE-TOE with SSD File Size(GB) DFS IO included in Hadoop, measures sequential access throughput We have two map tasks each writing to a file of increasing size (1-1GB) Significant improvement with IPoIB, SDP and 1GigE With SSD, performance improvement is almost seven or eight fold! SSD benefits not seen without using high-performance interconnect 43
44 Execution Time (sec) Hadoop: RandomWriter Performance Number of data nodes 1GE with HDD IGE with SSD IPoIB with HDD IPoIB with SSD SDP with HDD SDP with SSD 1GE-TOE with HDD 1GE-TOE with SSD Each map generates 1GB of random binary data and writes to HDFS SSD improves execution time by 5% with 1GigE for two DataNodes For four DataNodes, benefits are observed only with HPC interconnect IPoIB, SDP and 1GigE can improve performance by 59% on four Data Nodes 44
45 Execution Time (sec) Hadoop Sort Benchmark GE with HDD IGE with SSD IPoIB with HDD IPoIB with SSD SDP with HDD SDP with SSD 1GE-TOE with HDD 1GE-TOE with SSD 2 4 Number of data nodes Sort: baseline benchmark for Hadoop Sort phase: I/O bound; Reduce phase: communication bound SSD improves performance by 28% using 1GigE with two DataNodes Benefit of 5% on four DataNodes using SDP, IPoIB or 1GigE S. Sur, H. Wang, J. Huang, X. Ouyang and D. K. Panda Can High-Performance Interconnects Benefit Hadoop Distributed File System?, MASVDC 1 in conjunction with MICRO 21, Atlanta, GA. 45
46 HDFS Design Using Verbs Current Design OSU Design HDFS HDFS JNI Interface Sockets OSU Module 1/1 GigE Network InfiniBand (Verbs) 46
47 Time (ms) RDMA-based Design for Native HDFS Preliminary Results GigE IPoIB 1 GigE OSU-Design HDFS File Write Experiment using four data nodes on IB-DDR Cluster HDFS File Write Time GB 14 s, 5 GB 86s, File Size (GB) For 5 GB File Size - 2% improvement over IPoIB, 14% improvement over 1GigE 47
48 Presentation Outline Overview of Hadoop, Memcached and HBase Challenges in Accelerating Enterprise Middleware Designs and Case Studies Memcached HBase HDFS Conclusion and Q&A 48
49 Concluding Remarks InfiniBand with RDMA feature is gaining momentum in HPC systems with best performance and greater usage It is possible to use the RDMA feature in enterprise environments for accelerating big data processing Presented some initial designs and performance numbers Many open research challenges remain to be solved so that middleware for enterprise environments can take advantage of modern high-performance networks multi-core technologies emerging storage technologies 49
50 Designing Communication and I/O Libraries for Enterprise Systems: Solved a Few Initial Challenges Appl i c at i on s D at ac e n t e r M i ddl e war e (H D F S, H Ba s e, M a pre duc e, M e m c a c he d) P r ogr am m i n g M ode l s (S oc ke t ) Com m u n i c at i on an d I / O L i br ar y P oi nt -t o-p oi nt Com m uni c a t i on T hre a di ng M ode l s a nd S ync hroni z a t i on I/ O a nd F i l e s ys t e m s Q os F a ul t T ol e ra nc e N e t w orki ng T e c hnol ogi e s (Infi ni Ba nd, 1/ 1/ 4 G i G E, RN ICs & Int e l l i ge nt N ICs ) Com m odi t y Com put i ng S ys t e m A rc hi t e c t ure s (s i ngl e, dua l, qua d,..) M ul t i / M a ny-c ore A rc hi t e c t ure a nd A c c e l e ra t ors S t ora ge T e c hnol ogi e s (H D D or S S D ) 5
51 Web Pointers MVAPICH Web Page 51
Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached
Accelerating Big Data with Hadoop (HDFS, MapReduce and HBase) and Memcached Talk at HPC Advisory Council Lugano Conference (213) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationIn the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.
In the multi-core age, How do larger, faster and cheaper and more responsive sub-systems affect data management? Panel at ADMS 211 Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory Department
More informationMemcached Design on High Performance RDMA Capable Interconnects
Memcached Design on High Performance RDMA Capable Interconnects Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi- ur- Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan
More informationMVAPICH2 Project Update and Big Data Acceleration
MVAPICH2 Project Update and Big Data Acceleration Presentation at HPC Advisory Council European Conference 212 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationA Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS
A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS Adithya Bhat, Nusrat Islam, Xiaoyi Lu, Md. Wasi- ur- Rahman, Dip: Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More informationBig Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management
Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing and Management SigHPC BigData BoF (SC 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationAccelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects
Accelerating Data Management and Processing on Modern Clusters with RDMA-Enabled Interconnects Keynote Talk at ADMS 214 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationAccelerating Big Data Processing with RDMA- Enhanced Apache Hadoop
Accelerating Big Data Processing with RDMA- Enhanced Apache Hadoop Keynote Talk at BPOE-4, in conjunction with ASPLOS 14 (March 2014) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationSR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory
More informationHigh Performance File System and I/O Middleware Design for Big Data on HPC Clusters
High Performance File System and I/O Middleware Design for Big Data on HPC Clusters by Nusrat Sharmin Islam Advisor: Dhabaleswar K. (DK) Panda Department of Computer Science and Engineering The Ohio State
More informationStudy. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement
More informationThe Future of Supercomputer Software Libraries
The Future of Supercomputer Software Libraries Talk at HPC Advisory Council Israel Supercomputing Conference by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationUnifying UPC and MPI Runtimes: Experience with MVAPICH
Unifying UPC and MPI Runtimes: Experience with MVAPICH Jithin Jose Miao Luo Sayantan Sur D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationHigh Performance MPI Support in MVAPICH2 for InfiniBand Clusters
High Performance MPI Support in MVAPICH2 for InfiniBand Clusters A Talk at NVIDIA Booth (SC 11) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationSoftware Libraries and Middleware for Exascale Systems
Software Libraries and Middleware for Exascale Systems Talk at HPC Advisory Council China Workshop (212) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More informationMVAPICH2 and MVAPICH2-MIC: Latest Status
MVAPICH2 and MVAPICH2-MIC: Latest Status Presentation at IXPUG Meeting, July 214 by Dhabaleswar K. (DK) Panda and Khaled Hamidouche The Ohio State University E-mail: {panda, hamidouc}@cse.ohio-state.edu
More informationImpact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase
2 IEEE 8th International Conference on Cloud Computing Technology and Science Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase Xiaoyi Lu, Dipti Shankar, Shashank Gugnani,
More informationScaling with PGAS Languages
Scaling with PGAS Languages Panel Presentation at OFA Developers Workshop (2013) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationHigh Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters
High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters Presentation at Mellanox Theatre (SC 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationHigh Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters
High Performance Big Data (HiBD): Accelerating Hadoop, Spark and Memcached on Modern Clusters Presentation at Mellanox Theatre (SC 16) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationMPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefits Ashish Kumar Singh, Sreeram Potluri, Hao Wang, Krishna Kandalla, Sayantan Sur, and Dhabaleswar K. Panda Network-Based
More informationEnhancing Checkpoint Performance with Staging IO & SSD
Enhancing Checkpoint Performance with Staging IO & SSD Xiangyong Ouyang Sonya Marcarelli Dhabaleswar K. Panda Department of Computer Science & Engineering The Ohio State University Outline Motivation and
More informationIntra-MIC MPI Communication using MVAPICH2: Early Experience
Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University
More informationDesigning Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,
More informationMVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand
MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The
More informationDesigning High Performance Communication Middleware with Emerging Multi-core Architectures
Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationDesigning High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning
5th ANNUAL WORKSHOP 209 Designing High-Performance MPI Collectives in MVAPICH2 for HPC and Deep Learning Hari Subramoni Dhabaleswar K. (DK) Panda The Ohio State University The Ohio State University E-mail:
More informationAccelerating and Benchmarking Big Data Processing on Modern Clusters
Accelerating and Benchmarking Big Data Processing on Modern Clusters Keynote Talk at BPOE-6 (Sept 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More information2008 International ANSYS Conference
2008 International ANSYS Conference Maximizing Productivity With InfiniBand-Based Clusters Gilad Shainer Director of Technical Marketing Mellanox Technologies 2008 ANSYS, Inc. All rights reserved. 1 ANSYS,
More informationInterconnect Your Future
Interconnect Your Future Gilad Shainer 2nd Annual MVAPICH User Group (MUG) Meeting, August 2014 Complete High-Performance Scalable Interconnect Infrastructure Comprehensive End-to-End Software Accelerators
More informationMellanox Technologies Maximize Cluster Performance and Productivity. Gilad Shainer, October, 2007
Mellanox Technologies Maximize Cluster Performance and Productivity Gilad Shainer, shainer@mellanox.com October, 27 Mellanox Technologies Hardware OEMs Servers And Blades Applications End-Users Enterprise
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationSpark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay Mellanox Technologies
Spark Over RDMA: Accelerate Big Data SC Asia 2018 Ido Shamay 1 Apache Spark - Intro Spark within the Big Data ecosystem Data Sources Data Acquisition / ETL Data Storage Data Analysis / ML Serving 3 Apache
More informationHigh-Performance Training for Deep Learning and Computer Vision HPC
High-Performance Training for Deep Learning and Computer Vision HPC Panel at CVPR-ECV 18 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationOptimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2
Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2 H. Wang, S. Potluri, M. Luo, A. K. Singh, X. Ouyang, S. Sur, D. K. Panda Network-Based
More informationAccelerating and Benchmarking Big Data Processing on Modern Clusters
Accelerating and Benchmarking Big Data Processing on Modern Clusters Open RG Big Data Webinar (Sept 15) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationExploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR
Exploiting Full Potential of GPU Clusters with InfiniBand using MVAPICH2-GDR Presentation at Mellanox Theater () Dhabaleswar K. (DK) Panda - The Ohio State University panda@cse.ohio-state.edu Outline Communication
More informationDesigning High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing
Designing High-Performance Non-Volatile Memory-aware RDMA Communication Protocols for Big Data Processing Talk at Storage Developer Conference SNIA 2018 by Xiaoyi Lu The Ohio State University E-mail: luxi@cse.ohio-state.edu
More informationOverview of MVAPICH2 and MVAPICH2- X: Latest Status and Future Roadmap
Overview of MVAPICH2 and MVAPICH2- X: Latest Status and Future Roadmap MVAPICH2 User Group (MUG) MeeKng by Dhabaleswar K. (DK) Panda The Ohio State University E- mail: panda@cse.ohio- state.edu h
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationMemory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand
Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationLatest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand
Latest Advances in MVAPICH2 MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationMemcached Design on High Performance RDMA Capable Interconnects
211 International Conference on Parallel Processing Memcached Design on High Performance RDMA Capable Interconnects Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi-ur-Rahman,
More informationBirds of a Feather Presentation
Mellanox InfiniBand QDR 4Gb/s The Fabric of Choice for High Performance Computing Gilad Shainer, shainer@mellanox.com June 28 Birds of a Feather Presentation InfiniBand Technology Leadership Industry Standard
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio
More informationCoupling GPUDirect RDMA and InfiniBand Hardware Multicast Technologies for Streaming Applications
Coupling GPUDirect RDMA and InfiniBand Hardware Multicast Technologies for Streaming Applications GPU Technology Conference GTC 2016 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationOptimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications
Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, P. Lai, S. Narravula and D. K. Panda Network Based Computing Laboratory
More informationCan Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?
Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu
More informationPerformance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA
Performance Optimizations via Connect-IB and Dynamically Connected Transport Service for Maximum Performance on LS-DYNA Pak Lui, Gilad Shainer, Brian Klaff Mellanox Technologies Abstract From concept to
More informationHigh-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT
High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT Krishna Kandalla (1), Hari Subramoni (1), Karen Tomko (2), Dmitry Pekurovsky
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures M. Bayatpour, S. Chakraborty, H. Subramoni, X. Lu, and D. K. Panda Department of Computer Science and Engineering
More informationInfiniband and RDMA Technology. Doug Ledford
Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic
More informationDesigning Next Generation Data-Centers with Advanced Communication Protocols and Systems Services
Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services P. Balaji, K. Vaidyanathan, S. Narravula, H. W. Jin and D. K. Panda Network Based Computing Laboratory
More informationEnabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters
Enabling Efficient Use of UPC and OpenSHMEM PGAS models on GPU Clusters Presentation at GTC 2014 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationThe Future of Interconnect Technology
The Future of Interconnect Technology Michael Kagan, CTO HPC Advisory Council Stanford, 2014 Exponential Data Growth Best Interconnect Required 44X 0.8 Zetabyte 2009 35 Zetabyte 2020 2014 Mellanox Technologies
More informationDesigning Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda Presented by Dr. Xiaoyi
More informationMDHIM: A Parallel Key/Value Store Framework for HPC
MDHIM: A Parallel Key/Value Store Framework for HPC Hugh Greenberg 7/6/2015 LA-UR-15-25039 HPC Clusters Managed by a job scheduler (e.g., Slurm, Moab) Designed for running user jobs Difficult to run system
More informationRDMA for Memcached User Guide
0.9.5 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)
More informationAccelerate Big Data Processing (Hadoop, Spark, Memcached, & TensorFlow) with HPC Technologies
Accelerate Big Data Processing (Hadoop, Spark, Memcached, & TensorFlow) with HPC Technologies Talk at Intel HPC Developer Conference 2017 (SC 17) by Dhabaleswar K. (DK) Panda The Ohio State University
More informationNFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationHigh Performance MPI on IBM 12x InfiniBand Architecture
High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction
More informationThe Road to ExaScale. Advances in High-Performance Interconnect Infrastructure. September 2011
The Road to ExaScale Advances in High-Performance Interconnect Infrastructure September 2011 diego@mellanox.com ExaScale Computing Ambitious Challenges Foster Progress Demand Research Institutes, Universities
More informationEfficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning
Efficient and Scalable Multi-Source Streaming Broadcast on Clusters for Deep Learning Ching-Hsiang Chu 1, Xiaoyi Lu 1, Ammar A. Awan 1, Hari Subramoni 1, Jahanzeb Hashmi 1, Bracy Elton 2 and Dhabaleswar
More informationSolutions for Scalable HPC
Solutions for Scalable HPC Scot Schultz, Director HPC/Technical Computing HPC Advisory Council Stanford Conference Feb 2014 Leading Supplier of End-to-End Interconnect Solutions Comprehensive End-to-End
More informationCrossing the Chasm: Sneaking a parallel file system into Hadoop
Crossing the Chasm: Sneaking a parallel file system into Hadoop Wittawat Tantisiriroj Swapnil Patil, Garth Gibson PARALLEL DATA LABORATORY Carnegie Mellon University In this work Compare and contrast large
More informationA Portable InfiniBand Module for MPICH2/Nemesis: Design and Evaluation
A Portable InfiniBand Module for MPICH2/Nemesis: Design and Evaluation Miao Luo, Ping Lai, Sreeram Potluri, Emilio P. Mancini, Hari Subramoni, Krishna Kandalla, Dhabaleswar K. Panda Department of Computer
More informationOncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries
Oncilla - a Managed GAS Runtime for Accelerating Data Warehousing Queries Jeffrey Young, Alex Merritt, Se Hoon Shon Advisor: Sudhakar Yalamanchili 4/16/13 Sponsors: Intel, NVIDIA, NSF 2 The Problem Big
More informationOverview of the MVAPICH Project: Latest Status and Future Roadmap
Overview of the MVAPICH Project: Latest Status and Future Roadmap MVAPICH2 User Group (MUG) Meeting by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationMELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구
MELLANOX EDR UPDATE & GPUDIRECT MELLANOX SR. SE 정연구 Leading Supplier of End-to-End Interconnect Solutions Analyze Enabling the Use of Data Store ICs Comprehensive End-to-End InfiniBand and Ethernet Portfolio
More informationApplication-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More informationDesigning High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters
Designing High Performance Heterogeneous Broadcast for Streaming Applications on Clusters 1 Ching-Hsiang Chu, 1 Khaled Hamidouche, 1 Hari Subramoni, 1 Akshay Venkatesh, 2 Bracy Elton and 1 Dhabaleswar
More informationAssessing the Performance Impact of High-Speed Interconnects on MapReduce
Assessing the Performance Impact of High-Speed Interconnects on MapReduce Yandong Wang, Yizheng Jiao, Cong Xu, Xiaobing Li, Teng Wang, Xinyu Que, Cristian Cira, Bin Wang, Zhuo Liu, Bliss Bailey, Weikuan
More information10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G
10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures
More informationHPC Meets Big Data: Accelerating Hadoop, Spark, and Memcached with HPC Technologies
HPC Meets Big Data: Accelerating Hadoop, Spark, and Memcached with HPC Technologies Talk at OpenFabrics Alliance Workshop (OFAW 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationAccelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet
WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test
More informationMVAPICH2: A High Performance MPI Library for NVIDIA GPU Clusters with InfiniBand
MVAPICH2: A High Performance MPI Library for NVIDIA GPU Clusters with InfiniBand Presentation at GTC 213 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationAdvanced Topics in InfiniBand and High-speed Ethernet for Designing HEC Systems
Advanced Topics in InfiniBand and High-speed Ethernet for Designing HEC Systems A Tutorial at SC 13 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda
More informationMulti-Threaded UPC Runtime for GPU to GPU communication over InfiniBand
Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand Miao Luo, Hao Wang, & D. K. Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State
More informationMeltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies
Meltdown and Spectre Interconnect Evaluation Jan 2018 1 Meltdown and Spectre - Background Most modern processors perform speculative execution This speculation can be measured, disclosing information about
More informationAcuSolve Performance Benchmark and Profiling. October 2011
AcuSolve Performance Benchmark and Profiling October 2011 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell, Mellanox, Altair Compute
More informationExploiting InfiniBand and GPUDirect Technology for High Performance Collectives on GPU Clusters
Exploiting InfiniBand and Direct Technology for High Performance Collectives on Clusters Ching-Hsiang Chu chu.368@osu.edu Department of Computer Science and Engineering The Ohio State University OSU Booth
More informationInterconnect Your Future
Interconnect Your Future Smart Interconnect for Next Generation HPC Platforms Gilad Shainer, August 2016, 4th Annual MVAPICH User Group (MUG) Meeting Mellanox Connects the World s Fastest Supercomputer
More informationHigh-Performance MPI Library with SR-IOV and SLURM for Virtualized InfiniBand Clusters
High-Performance MPI Library with SR-IOV and SLURM for Virtualized InfiniBand Clusters Talk at OpenFabrics Workshop (April 2016) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More informationFROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE
FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14
More informationAdvanced RDMA-based Admission Control for Modern Data-Centers
Advanced RDMA-based Admission Control for Modern Data-Centers Ping Lai Sundeep Narravula Karthikeyan Vaidyanathan Dhabaleswar. K. Panda Computer Science & Engineering Department Ohio State University Outline
More informationPerformance Analysis and Evaluation of LANL s PaScalBB I/O nodes using Quad-Data-Rate Infiniband and Multiple 10-Gigabit Ethernets Bonding
Performance Analysis and Evaluation of LANL s PaScalBB I/O nodes using Quad-Data-Rate Infiniband and Multiple 10-Gigabit Ethernets Bonding Hsing-bugn Chen, Alfred Torrez, Parks Fields HPC-5, Los Alamos
More informationPerformance Evaluation of Soft RoCE over 1 Gigabit Ethernet
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 7-66, p- ISSN: 7-77Volume 5, Issue (Nov. - Dec. 3), PP -7 Performance Evaluation of over Gigabit Gurkirat Kaur, Manoj Kumar, Manju Bala Department
More informationDesigning Shared Address Space MPI libraries in the Many-core Era
Designing Shared Address Space MPI libraries in the Many-core Era Jahanzeb Hashmi hashmi.29@osu.edu (NBCL) The Ohio State University Outline Introduction and Motivation Background Shared-memory Communication
More informationBig Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing
Big Data Meets HPC: Exploiting HPC Technologies for Accelerating Big Data Processing Talk at HPCAC-Switzerland (April 17) by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationGPU-centric communication for improved efficiency
GPU-centric communication for improved efficiency Benjamin Klenk *, Lena Oden, Holger Fröning * * Heidelberg University, Germany Fraunhofer Institute for Industrial Mathematics, Germany GPCDP Workshop
More informationFuture Routing Schemes in Petascale clusters
Future Routing Schemes in Petascale clusters Gilad Shainer, Mellanox, USA Ola Torudbakken, Sun Microsystems, Norway Richard Graham, Oak Ridge National Laboratory, USA Birds of a Feather Presentation Abstract
More informationAccelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads
WHITE PAPER Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads December 2014 Western Digital Technologies, Inc. 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents
More informationSupport for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth
Support for GPUs with GPUDirect RDMA in MVAPICH2 SC 13 NVIDIA Booth by D.K. Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda Outline Overview of MVAPICH2-GPU
More informationEC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures
EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures Haiyang Shi, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda {shi.876, lu.932, panda.2}@osu.edu The Ohio State University
More information