In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.

Size: px

Start display at page:

Download "In the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K."

Gervais Scott York
5 years ago
Views:

1 In the multi-core age, How do larger, faster and cheaper and more responsive sub-systems affect data management? Panel at ADMS 211 Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University

2 Motivation Modern servers are providing us large amount of and multicore processors per node-basis is emerging as replacement for Huge amount of across a set of servers provide new opportunities for designing data management systems? High performance commodity networks like InfiniBand with RDMA mechanism are allowing us to design very large HPC clusters with Petaflop performance Working on high-performance Message Passing Interface (MPI) software over InfiniBand (open-source MVAPICH project) for the last ten years Used by more than 1,65 organizations in 63 countries Empowering many TOP5 systems and emerging Petaflop systems 111,14-cores NASA Pleiades (7 th ranked) and 62,976-core TACC Ranger (17 th ranked) Available with Redhat, SuSE and other Linux distros 2

Offload OSU Module Verbs Interface 1/1 GigE Network 1 GigE or InfiniBand 1 GigE or InfiniBand Sockets not designed for

3 Can New Data Management Systems be designed with High-Performance and Protocols? Current Design Enhanced Designs Our Approach Application Application Application Sockets Accelerated Sockets Verbs / Hardware Offload OSU Module Verbs Interface 1/1 GigE Network 1 GigE or InfiniBand 1 GigE or InfiniBand Sockets not designed for high-performance Stream semantics often mismatch for upper layers (Memcached, HBase, Hadoop) Zero-copy not available for non-blocking sockets Interesting interplay between, storage and interconnect 3

4 Transactions per second(tps) Memcached!"#$%"$#& High Performance High Performance High Performance (System Architecture) Web Frontend Servers (Memcached Clients) SDP 1G - TOE OSU Design (Memcached Servers) 8 Clients 16 Clients Intel Clovertown Cluster (IB: DDR) On IB DDR about 33K/s for 16 clients Almost factor of four improvement over 1GE (TOE) (Database Servers) Memcached Get transactions per second for 4K bytes SDP IPoIB OSU Design 8 Clients 16 Clients Intel Westmere Cluster (IB: QDR) On IB QDR about 842K/s for 16 clients Almost factor of seven improvement over IPoIB (Aggregated Memcached) J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. Rahman, N. S. Islam, X. Ouyang, H. Wang, S. Sur and D. K. Panda, Memcached Design on High Performance RDMA Capable Interconnects, Int'l Conference on Parallel Processing (ICPP '11), Sept

5 Operations /sec Operations /sec High Performance (HBase Clients) (HRegion Servers) (Data Nodes) High Performance HBase (/) (/) (/) HBase Get Operation - Throughput Intel Clovertown Cluster (IB: DDR) 1GE IPoIB 1GE OSU Design 1K 4K Message Size (HBase System Architecture) 25 Intel Westmere Cluster (IB: QDR) IB:DDR: 1K bytes 65 us (15K TPS) 4K bytes us (11K TPS) Almost factor of two improvement over 1GE (TOE) IB:QDR: 1K bytes 47 us (22K TPS) 4K bytes us (16K TPS) Almost factor of four improvement over IPoIB for 1KB 5 1K 4K Message Size 5

6 Execution Time (sec) (/) HDFS High Performance (System Architecture) (/) (/) Sort: baseline benchmark for Hadoop Sort phase: I/O bound; Reduce phase: communication bound improves performance by 28% using 1GigE with two DataNodes Benefit of 5% on four DataNodes using SDP, IPoIB or 1GigE (HDFS Clients) 25 (HDFS Data Nodes) Number of data nodes 1GE with IGE with IPoIB with IPoIB with SDP with SDP with 1GE-TOE with 1GE-TOE with S. Sur, H. Wang, J. Huang, X. Ouyang and D. K. Panda, Can High-Performance Interconnects Benefit Hadoop Distributed File System?, Workshop on Micro Architectural Support for Virtualization, Data Center Computing and Clouds, in Conjunction with MICRO 21, Dec 21, Atlanta, GA, USA 6

7 us us -Assisted Hybrid Memory RAM CPU RAM/ Hybrid Memory Random Read: 1 KB object: Hybrid is 3.6X faster than VMS 4 KB object: Hybrid is 3.8X faster than VMS Random Write: 1 KB object: Hybrid is 7.X faster than VMS 4 KB object: Hybrid is 24.7X faster than VMS Random Read Latency Random Write Latency Hybrid VMS K 2K 4K Object Sizes (Bytes) Hybrid VMS K 2K 4K Object Sizes (Bytes) Hybrid: RAM/ Hybrid Memory VMS: as Virtual Memory Swap Device : Fusion-io iodrive SLC 8GB 7

8 us us Memcached + Hybrid Memory Memcached Get Latency InfiniBand-Verbs InfiniBand-IPoIB 1 GigE 1 GigE 15 High Performance 1 5 (Aggregated Memcache) K 2K 4K Object Sizes (Bytes) IB DDR, with Hybrid Memory Memcached Get with InfiniBand-Verbs: 1 KB object: IB is 1.5X faster than 1GigE Memcached Put Latency InfiniBand-Verbs InfiniBand-IPoIB 1 GigE 1 GigE Memcached Put with InfiniBand-Verbs: 1 KB object: IB is 2.9X faster than 1GigE K 2K 4K Object Sizes (Bytes) 8

9 Conclusion High Performance networks like InfiniBand and RDMA protocols together with s are opening up new ways to design modern enterprise systems Aggregation of across nodes (Memcached) Aggregation of and (Hybrid with in a node and Memcached + Hybrid ) High performance designs for HBase and HDFS Potential to design next-generation high-performance and scalable data management systems 9

10 Thank You! Network-Based Computing Laboratory MVAPICH Web Page 1

Acceleration for Big Data, Hadoop and Memcached

Acceleration for Big Data, Hadoop and Memcached A Presentation at HPC Advisory Council Workshop, Lugano 212 by Dhabaleswar K. (DK) Panda The Ohio State University E-mail: panda@cse.ohio-state.edu http://www.cse.ohio-state.edu/~panda