Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand
|
|
- Rosamund Allen
- 6 years ago
- Views:
Transcription
1 Supporting Strong Cache Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda The Ohio State University
2 Presentation Outline Introduction/Motivation Design and Implementation Experimental Results Conclusions
3 Introduction Fast Internet Growth Number of Users Amount of data Types of services Several uses E-Commerce, Online Banking, Online Auctions, etc Types of Content Images, documents, audio clips, video clips, etc - Static Content Stock Quotes, Online Stores (Amazon), Online Banking, etc. - Dynamic Content (Active
4 Presentation Outline Introduction/Motivation Multi-Tier Data-Centers Active Caches InfiniBand Design and Implementation Experimental Results Conclusions
5 Multi-Tier Data-Centers Single Powerful Computers Clusters Low Cost to Performance Ratio Increasingly Popular Multi-Tier Data-Centers Scalability an important issue
6 A Typical Multi-Tier Data-Center Web Servers Apache Clients Tier 0 Proxy Nodes Tier 2 Database Servers WAN Tier 1 Application Servers PHP
7 Tiers of a Typical Multi-Tier Data-Center Proxy Nodes Handle Caching, load balancing, security, etc Web Servers Handle the HTML content Application Servers Handle Dynamic Content, Provide Services Database Servers Handle persistent storage
8 Data-Center Characteristics Front-End Tiers Computation Back-End Tiers The amount of computation required for processing each request increases as we go to the inner tiers of the Data-Center Caching at the front tiers is an important factor for scalability
9 Presentation Outline Introduction/Motivation Introduction Multi-Tier Data-Centers Active Caches InfiniBand Design and Implementation Experimental Results Conclusions
10 Caching Can avoid re-fetching of content Beneficial if requests repeat Static content caching Well studied in the past Widely used Number of Requests Decrease Front-End Tiers Back-End Tiers
11 Active Caching Dynamic Data Stock Quotes, Scores, Personalized Content, etc Simple caching methods not suited Issues Consistency Coherency User Request Proxy Node Cache Back-End Data Update
12 Cache Consistency Non-decreasing views of system state Updates seen by all or none Proxy Nodes Back-End Nodes User Requests Update
13 Cache Coherency Refers to the average staleness of the document served from cache Two models of coherence Bounded staleness (Weak Coherency) Strong or immediate (Strong Coherency)
14 Strong Cache Coherency An absolute necessity for certain kinds of data Online shopping, Travel ticket availability, Stock Quotes, Online auctions Example: Online banking Cannot afford to show different values to different concurrent requests
15 Caching policies No Caching Client Polling Invalidation * TTL/Adaptive TTL Consistency Coherency *D. Li, P. Cao, and M. Dahlin. WCIP: Web Cache Invalidation Protocol. IETF Internet Draft, November 2000.
16 Presentation Outline Introduction/Motivation Introduction Multi-Tier Data-Centers Active Caches InfiniBand Design and Implementation Experimental Results Conclusions
17 InfiniBand High Performance Low latency High Bandwidth Open Industry Standard Provides rich features RDMA, Remote Atomic operations, etc Targeted for Data-Centers Transport Layers VAPI IPoIB SDP
18 Performance 140 Latency 900 Throughput Latency (us) IPoIB SDP VAPI Throughput (MB/s) IPoIB SDP VAPI K 2K 4K 8K 16K Message Size K 4K 16K 64K Message Size Low latencies of less than 5us achieved Bandwidth over 840 MB/s * SDP and IPoIB from Voltaire s Software Stack
19 Performance Throughput (RDMA Read) Throughput (Mbps) K 4K 16K 64K 256K Message Size (bytes) 0 Send CPU Throughput (Poll) Recv CPU Throughput (Event) Receiver side CPU utilization is very low Leveraging the benefits of One sided communication
20 Caching policies No Caching Client Polling Invalidation TTL/Adaptive TTL Consistency Coherency
21 Objective To design an architecture that very efficiently supports strong cache coherency on InfiniBand
22 Presentation Outline Introduction/Motivation Design and Implementation Experimental Results Conclusions
23 Basic Architecture External modules are used Module communication can use any transport Versioning: Application servers version dynamic data Version value of data passed to front end with every request to back-end Version maintained by front end along with cached value of response
24 Mechanism Cache Hit: Back-end Version Check If version current, use cache Invalidate data for failed version check Cache Miss Get data to cache Initialize local versions
25 Architecture Front-End Back-End Request Cache Hit Response Cache Miss
26 Design Every server has an associated module that uses IPoIB, SDP or VAPI to communicate VAPI: When a request arrives at proxy, VAPI module is contacted. Module reads latest version of the data from the back-end using one-sided RDMA Read operation If versions do not match, cached value is invalidated
27 VAPI Architecture Front-End Back-End Request Cache Hit RDMA Read Response Cache Miss
28 Implementation Socket-based Implementation: IPoIB and SDP are used Back-end version check is done using two-sided communication from the module Requests to read and update are mutually excluded at the back-end module to avoid simultaneous readers and writers accessing the same data. Minimal changes to existing software
29 Presentation Outline Introduction/Motivation Design and Implementation Experimental Results Data-Center Throughput Data-Center Response Time Data-Center Break-up Zipf and WC Trace Throughput Conclusions
30 Experimental Test-bed Eight Dual 2.4GHz Xeon processor nodes 64-bit 133MHz PCI-X interfaces 512KB L2-Cache and 400MHz Front Side Bus Mellanox InfiniHost MT23108 Dual Port 4x HCAs MT43132 eight 4x port Switch SDK version Firmware version 1.17
31 Data-Center: Performance DataCenter: Throughput Transactions per second (TPS) Number of Compute Threads No Cache IPoIB VAPI SDP The VAPI module can sustain performance even with heavy load on the back-end servers
32 Data-Center: Performance Datacenter: Response Time Response time (ms) Number of Compute Threads NoCache IPoIB VAPI SDP The VAPI module responds faster even with heavy load on the back-end servers
33 Response Time Breakup Response Time Splitup - 0 Compute Threads Response Time Splitup Compute Threads 8 8 Time (ms) IPoIB SDP VAPI Time (ms) IPoIB SDP VAPI Client Communication Proxy Processing Module Processing Backend version check 0 Client Communication Proxy Processing Module Processing Backend version check Worst case Module Overhead less than 10% of the response time Minimal overhead for VAPI based version check even for 200 compute threads
34 Data-Center: Throughput Throughput: ZipF distribution ThroughPut: World Cup Trace Transactions per Second (TPS) Number of Compute Threads Transactions Per Second (TPS) Number of Compute Threads No Cache IPoIB VAPI SDP NoCache IPoIB VAPI SDP The drop in the throughput of VAPI in World cup trace is due to the higher penalty for cache misses under increased load VAPI implementation does better for real trace too
35 Conclusions An architecture for supporting Strong Cache Coherence External module based design Freedom in choice of transport Minimal changes to existing software Sockets API inherent limitation Two-sided communication High performance Sockets not the solution (SDP) Main benefit One sided nature of RDMA calls
36 Web Pointers NBC home page {narravul, balaji, vaidyana, savitha, wuj,
S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda. The Ohio State University
Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data- Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H.-W. Jin and D. K. Panda The Ohio State University
More informationAdvanced RDMA-based Admission Control for Modern Data-Centers
Advanced RDMA-based Admission Control for Modern Data-Centers Ping Lai Sundeep Narravula Karthikeyan Vaidyanathan Dhabaleswar. K. Panda Computer Science & Engineering Department Ohio State University Outline
More informationOptimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications
Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, P. Lai, S. Narravula and D. K. Panda Network Based Computing Laboratory
More informationDesigning Next Generation Data-Centers with Advanced Communication Protocols and Systems Services
Designing Next Generation Data-Centers with Advanced Communication Protocols and Systems Services P. Balaji, K. Vaidyanathan, S. Narravula, H. W. Jin and D. K. Panda Network Based Computing Laboratory
More informationHigh Performance Distributed Lock Management Services using Network-based Remote Atomic Operations
High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations S. Narravula, A. Mamidala, A. Vishnu, K. Vaidyanathan, and D. K. Panda Presented by Lei Chai Network Based
More informationEvaluating the Impact of RDMA on Storage I/O over InfiniBand
Evaluating the Impact of RDMA on Storage I/O over InfiniBand J Liu, DK Panda and M Banikazemi Computer and Information Science IBM T J Watson Research Center The Ohio State University Presentation Outline
More informationImplementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand
Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand Jiuxing Liu and Dhabaleswar K. Panda Computer Science and Engineering The Ohio State University Presentation Outline Introduction
More informationDynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand
Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand S. KRISHNAMOORTHY, P. BALAJI, K. VAIDYANATHAN, H. -W. JIN AND D. K. PANDA Technical
More informationDesigning Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services. Presented by: Jitong Chen
Designing Next-Generation Data- Centers with Advanced Communication Protocols and Systems Services Presented by: Jitong Chen Outline Architecture of Web-based Data Center Three-Stage framework to benefit
More informationDesigning High Performance DSM Systems using InfiniBand Features
Designing High Performance DSM Systems using InfiniBand Features Ranjit Noronha and Dhabaleswar K. Panda The Ohio State University NBC Outline Introduction Motivation Design and Implementation Results
More informationSockets Direct Procotol over InfiniBand in Clusters: Is it Beneficial?
Sockets Direct Procotol over InfiniBand in Clusters: Is it Beneficial? P. Balaji S. Narravula K. Vaidyanathan S. Krishnamoorthy J. Wu D. K. Panda Computer and Information Science, The Ohio State University
More informationWorkload-driven Analysis of File Systems in Shared Multi-tier Data-Centers over InfiniBand
Workload-driven Analysis of File Systems in Shared Multi-tier Data-Centers over InfiniBand K. VAIDYANATHAN, P. BALAJI, H. -W. JIN AND D. K. PANDA Technical Report OSU-CISRC-12/4-TR65 Workload-driven Analysis
More informationPerformance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms Sayantan Sur, Matt Koop, Lei Chai Dhabaleswar K. Panda Network Based Computing Lab, The Ohio State
More informationCan Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?
Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems? Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang and D. K. Panda {surs, vishnu, jinhy, huanwei, panda}@cse.ohio-state.edu
More informationHigh Performance Distributed Lock Management Services using Network-based Remote Atomic Operations
High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations S. Narravula A. Mamidala A. Vishnu K. Vaidyanathan D. K. Panda Department of Computer Science and Engineering
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Amith Mamidala Abhinav Vishnu Dhabaleswar K Panda Department of Computer and Science and Engineering The Ohio State University Columbus,
More informationSR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Kandalla, Mark Arnold and Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory
More informationMemory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand
Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand Matthew Koop, Wei Huang, Ahbinav Vishnu, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of
More informationImproving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters
Improving Application Performance and Predictability using Multiple Virtual Lanes in Modern Multi-Core InfiniBand Clusters Hari Subramoni, Ping Lai, Sayantan Sur and Dhabhaleswar. K. Panda Department of
More informationPerformance Evaluation of RDMA over IP: A Case Study with Ammasso Gigabit Ethernet NIC
Performance Evaluation of RDMA over IP: A Case Study with Ammasso Gigabit Ethernet NIC HYUN-WOOK JIN, SUNDEEP NARRAVULA, GREGORY BROWN, KARTHIKEYAN VAIDYANATHAN, PAVAN BALAJI, AND DHABALESWAR K. PANDA
More informationPerformance Evaluation of InfiniBand with PCI Express
Performance Evaluation of InfiniBand with PCI Express Jiuxing Liu Server Technology Group IBM T. J. Watson Research Center Yorktown Heights, NY 1598 jl@us.ibm.com Amith Mamidala, Abhinav Vishnu, and Dhabaleswar
More informationDesigning Efficient Systems Services and Primitives for Next-Generation Data-Centers
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan S. Narravula P. Balaji D. K. Panda Department of Computer Science and Engineering The Ohio State University
More informationMemcached Design on High Performance RDMA Capable Interconnects
Memcached Design on High Performance RDMA Capable Interconnects Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi- ur- Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan
More informationDesigning High Performance Communication Middleware with Emerging Multi-core Architectures
Designing High Performance Communication Middleware with Emerging Multi-core Architectures Dhabaleswar K. (DK) Panda Department of Computer Science and Engg. The Ohio State University E-mail: panda@cse.ohio-state.edu
More informationUnified Runtime for PGAS and MPI over OFED
Unified Runtime for PGAS and MPI over OFED D. K. Panda and Sayantan Sur Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University, USA Outline Introduction
More informationDesign and Evaluation of Benchmarks for Financial Applications using Advanced Message Queuing Protocol (AMQP) over InfiniBand
Design and Evaluation of Benchmarks for Financial Applications using Advanced Message Queuing Protocol (AMQP) over InfiniBand Hari Subramoni, Gregory Marsh, Sundeep Narravula, Ping Lai, and Dhabaleswar
More informationRDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits
RDMA Read Based Rendezvous Protocol for MPI over InfiniBand: Design Alternatives and Benefits Sayantan Sur Hyun-Wook Jin Lei Chai D. K. Panda Network Based Computing Lab, The Ohio State University Presentation
More informationDesigning Power-Aware Collective Communication Algorithms for InfiniBand Clusters
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters Krishna Kandalla, Emilio P. Mancini, Sayantan Sur, and Dhabaleswar. K. Panda Department of Computer Science & Engineering,
More informationStudy. Dhabaleswar. K. Panda. The Ohio State University HPIDC '09
RDMA over Ethernet - A Preliminary Study Hari Subramoni, Miao Luo, Ping Lai and Dhabaleswar. K. Panda Computer Science & Engineering Department The Ohio State University Introduction Problem Statement
More informationAsynchronous Zero-copy Communication for Synchronous Sockets in the Sockets Direct Protocol (SDP) over InfiniBand
In the workshop on Communication Architecture for Clusters (CAC); held in conjunction with IPDPS, Rhodes Island, Greece, April 6. Also available as Ohio State University technical report OSU-CISRC-/5-TR6.
More informationHigh Performance MPI on IBM 12x InfiniBand Architecture
High Performance MPI on IBM 12x InfiniBand Architecture Abhinav Vishnu, Brad Benton 1 and Dhabaleswar K. Panda {vishnu, panda} @ cse.ohio-state.edu {brad.benton}@us.ibm.com 1 1 Presentation Road-Map Introduction
More informationApplication-Transparent Checkpoint/Restart for MPI Programs over InfiniBand
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand Qi Gao, Weikuan Yu, Wei Huang, Dhabaleswar K. Panda Network-Based Computing Laboratory Department of Computer Science & Engineering
More information10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G
10-Gigabit iwarp Ethernet: Comparative Performance Analysis with InfiniBand and Myrinet-10G Mohammad J. Rashti and Ahmad Afsahi Queen s University Kingston, ON, Canada 2007 Workshop on Communication Architectures
More informationExploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers
Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers K. Vaidyanathan Comp. Science and Engg., Ohio State University vaidyana@cse.ohio-state.edu H.
More informationUnifying UPC and MPI Runtimes: Experience with MVAPICH
Unifying UPC and MPI Runtimes: Experience with MVAPICH Jithin Jose Miao Luo Sayantan Sur D. K. Panda Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University,
More informationA Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS
A Plugin-based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS Adithya Bhat, Nusrat Islam, Xiaoyi Lu, Md. Wasi- ur- Rahman, Dip: Shankar, and Dhabaleswar K. (DK) Panda Network- Based Compu2ng
More informationCan Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? N. S. Islam, X. Lu, M. W. Rahman, and D. K. Panda Network- Based Compu2ng Laboratory Department of Computer
More information* Department of Computer Science Jackson State University Jackson, MS 39217
The 2006 International Conference on Parallel & Distributed Processing Techniques & Applications, Las Vegas, Nevada, June 2006 Performance Analysis of Network Storage Manager System Using DAFS over InfiniBand
More informationDB2 purescale: High Performance with High-Speed Fabrics. Author: Steve Rees Date: April 5, 2011
DB2 purescale: High Performance with High-Speed Fabrics Author: Steve Rees Date: April 5, 2011 www.openfabrics.org IBM 2011 Copyright 1 Agenda Quick DB2 purescale recap DB2 purescale comes to Linux DB2
More informationWhitepaper / Benchmark
Whitepaper / Benchmark Web applications on LAMP run up to 8X faster with Dolphin Express DOLPHIN DELIVERS UNPRECEDENTED PERFORMANCE TO THE LAMP-STACK MARKET Marianne Ronström Open Source Consultant iclaustron
More informationHigh-Performance Key-Value Store on OpenSHMEM
High-Performance Key-Value Store on OpenSHMEM Huansong Fu*, Manjunath Gorentla Venkata, Ahana Roy Choudhury*, Neena Imam, Weikuan Yu* *Florida State University Oak Ridge National Laboratory Outline Background
More informationMulti-Threaded UPC Runtime for GPU to GPU communication over InfiniBand
Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand Miao Luo, Hao Wang, & D. K. Panda Network- Based Compu2ng Laboratory Department of Computer Science and Engineering The Ohio State
More informationHigh-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT
High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A study with Parallel 3DFFT Krishna Kandalla (1), Hari Subramoni (1), Karen Tomko (2), Dmitry Pekurovsky
More informationUbiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses
Ubiquitous and Mobile Computing CS 525M: Virtually Unifying Personal Storage for Fast and Pervasive Data Accesses Pengfei Tang Computer Science Dept. Worcester Polytechnic Institute (WPI) Introduction:
More informationNFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications
NFS/RDMA over 40Gbps iwarp Wael Noureddine Chelsio Communications Outline RDMA Motivating trends iwarp NFS over RDMA Overview Chelsio T5 support Performance results 2 Adoption Rate of 40GbE Source: Crehan
More informationMemory Management Strategies for Data Serving with RDMA
Memory Management Strategies for Data Serving with RDMA Dennis Dalessandro and Pete Wyckoff (presenting) Ohio Supercomputer Center {dennis,pw}@osc.edu HotI'07 23 August 2007 Motivation Increasing demands
More informationDiffusion TM 5.0 Performance Benchmarks
Diffusion TM 5.0 Performance Benchmarks Contents Introduction 3 Benchmark Overview 3 Methodology 4 Results 5 Conclusion 7 Appendix A Environment 8 Diffusion TM 5.0 Performance Benchmarks 2 1 Introduction
More informationReducing Network Contention with Mixed Workloads on Modern Multicore Clusters
Reducing Network Contention with Mixed Workloads on Modern Multicore Clusters Matthew Koop 1 Miao Luo D. K. Panda matthew.koop@nasa.gov {luom, panda}@cse.ohio-state.edu 1 NASA Center for Computational
More informationIntel Enterprise Processors Technology
Enterprise Processors Technology Kosuke Hirano Enterprise Platforms Group March 20, 2002 1 Agenda Architecture in Enterprise Xeon Processor MP Next Generation Itanium Processor Interconnect Technology
More informationBenefits of I/O Acceleration Technology (I/OAT) in Clusters
Benefits of I/O Acceleration Technology (I/OAT) in Clusters K. VAIDYANATHAN AND D. K. PANDA Technical Report Ohio State University (OSU-CISRC-2/7-TR13) The 27 IEEE International Symposium on Performance
More informationSummary Cache based Co-operative Proxies
Summary Cache based Co-operative Proxies Project No: 1 Group No: 21 Vijay Gabale (07305004) Sagar Bijwe (07305023) 12 th November, 2007 1 Abstract Summary Cache based proxies cooperate behind a bottleneck
More informationEVALUATING INFINIBAND PERFORMANCE WITH PCI EXPRESS
EVALUATING INFINIBAND PERFORMANCE WITH PCI EXPRESS INFINIBAND HOST CHANNEL ADAPTERS (HCAS) WITH PCI EXPRESS ACHIEVE 2 TO 3 PERCENT LOWER LATENCY FOR SMALL MESSAGES COMPARED WITH HCAS USING 64-BIT, 133-MHZ
More informationMark Falco Oracle Coherence Development
Achieving the performance benefits of Infiniband in Java Mark Falco Oracle Coherence Development 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy
More informationEfficient and Truly Passive MPI-3 RMA Synchronization Using InfiniBand Atomics
1 Efficient and Truly Passive MPI-3 RMA Synchronization Using InfiniBand Atomics Mingzhe Li Sreeram Potluri Khaled Hamidouche Jithin Jose Dhabaleswar K. Panda Network-Based Computing Laboratory Department
More informationMVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand
MVAPICH-Aptus: Scalable High-Performance Multi-Transport MPI over InfiniBand Matthew Koop 1,2 Terry Jones 2 D. K. Panda 1 {koop, panda}@cse.ohio-state.edu trj@llnl.gov 1 Network-Based Computing Lab, The
More informationBe Fast, Cheap and in Control with SwitchKV. Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Goal: fast and cost-efficient key-value store Store, retrieve, manage key-value objects Get(key)/Put(key,value)/Delete(key) Target: cluster-level
More informationDesign Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters
Design Alternatives for Implementing Fence Synchronization in MPI-2 One-Sided Communication for InfiniBand Clusters G.Santhanaraman, T. Gangadharappa, S.Narravula, A.Mamidala and D.K.Panda Presented by:
More informationPOWER7: IBM's Next Generation Server Processor
POWER7: IBM's Next Generation Server Processor Acknowledgment: This material is based upon work supported by the Defense Advanced Research Projects Agency under its Agreement No. HR0011-07-9-0002 Outline
More informationInfiniband and RDMA Technology. Doug Ledford
Infiniband and RDMA Technology Doug Ledford Top 500 Supercomputers Nov 2005 #5 Sandia National Labs, 4500 machines, 9000 CPUs, 38TFlops, 1 big headache Performance great...but... Adding new machines problematic
More informationBe Fast, Cheap and in Control with SwitchKV Xiaozhou Li
Be Fast, Cheap and in Control with SwitchKV Xiaozhou Li Raghav Sethi Michael Kaminsky David G. Andersen Michael J. Freedman Goal: fast and cost-effective key-value store Target: cluster-level storage for
More informationRDMA for Memcached User Guide
0.9.5 User Guide HIGH-PERFORMANCE BIG DATA TEAM http://hibd.cse.ohio-state.edu NETWORK-BASED COMPUTING LABORATORY DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING THE OHIO STATE UNIVERSITY Copyright (c)
More informationDemotion-Based Exclusive Caching through Demote Buffering: Design and Evaluations over Different Networks
1 Demotion-Based Exclusive Caching through Demote Buffering: Design and Evaluations over Different Networks Jiesheng Wu Pete Wyckoff Dhabaleswar K. Panda Dept. of Computer and Information Science Ohio
More informationCRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron, Hao Wang, Jian Huang, Dhabaleswar K. Panda Department of Computer
More informationIntroduction. Architecture Overview
Performance and Sizing Guide Version 17 November 2017 Contents Introduction... 5 Architecture Overview... 5 Performance and Scalability Considerations... 6 Vertical Scaling... 7 JVM Heap Sizes... 7 Hardware
More informationIn the multi-core age, How do larger, faster and cheaper and more responsive memory sub-systems affect data management? Dhabaleswar K.
In the multi-core age, How do larger, faster and cheaper and more responsive sub-systems affect data management? Panel at ADMS 211 Dhabaleswar K. (DK) Panda Network-Based Computing Laboratory Department
More informationIntra-MIC MPI Communication using MVAPICH2: Early Experience
Intra-MIC MPI Communication using MVAPICH: Early Experience Sreeram Potluri, Karen Tomko, Devendar Bureddy, and Dhabaleswar K. Panda Department of Computer Science and Engineering Ohio State University
More informationFlash: an efficient and portable web server
Flash: an efficient and portable web server High Level Ideas Server performance has several dimensions Lots of different choices on how to express and effect concurrency in a program Paper argues that
More informationApplication Acceleration Beyond Flash Storage
Application Acceleration Beyond Flash Storage Session 303C Mellanox Technologies Flash Memory Summit July 2014 Accelerating Applications, Step-by-Step First Steps Make compute fast Moore s Law Make storage
More informationFROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE
FROM HPC TO THE CLOUD WITH AMQP AND OPEN SOURCE SOFTWARE Carl Trieloff cctrieloff@redhat.com Red Hat Lee Fisher lee.fisher@hp.com Hewlett-Packard High Performance Computing on Wall Street conference 14
More informationEvaluation of Strong Consistency Web Caching Techniques
World Wide Web: Internet and Web Information Systems, 5, 95 123, 2002 2002 Kluwer Academic Publishers. Manufactured in The Netherlands. Evaluation of Strong Consistency Web Caching Techniques L. Y. CAO
More informationPOWER7: IBM's Next Generation Server Processor
Hot Chips 21 POWER7: IBM's Next Generation Server Processor Ronald Kalla Balaram Sinharoy POWER7 Chief Engineer POWER7 Chief Core Architect Acknowledgment: This material is based upon work supported by
More informationAccelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K.
Accelerating MPI Message Matching and Reduction Collectives For Multi-/Many-core Architectures Mohammadreza Bayatpour, Hari Subramoni, D. K. Panda Department of Computer Science and Engineering The Ohio
More informationCascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching
Cascade Mapping: Optimizing Memory Efficiency for Flash-based Key-value Caching Kefei Wang and Feng Chen Louisiana State University SoCC '18 Carlsbad, CA Key-value Systems in Internet Services Key-value
More informationA Practical Scalable Distributed B-Tree
A Practical Scalable Distributed B-Tree CS 848 Paper Presentation Marcos K. Aguilera, Wojciech Golab, Mehul A. Shah PVLDB 08 March 8, 2010 Presenter: Evguenia (Elmi) Eflov Presentation Outline 1 Background
More informationArchitecture of a Real-Time Operational DBMS
Architecture of a Real-Time Operational DBMS Srini V. Srinivasan Founder, Chief Development Officer Aerospike CMG India Keynote Thane December 3, 2016 [ CMGI Keynote, Thane, India. 2016 Aerospike Inc.
More informationMicro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects
Micro-Benchmark Level Performance Comparison of High-Speed Cluster Interconnects Jiuxing Liu Balasubramanian Chandrasekaran Weikuan Yu Jiesheng Wu Darius Buntinas Sushmitha Kini Peter Wyckoff Dhabaleswar
More informationBig data, little time. Scale-out data serving. Scale-out data serving. Highly skewed key popularity
/7/6 Big data, little time Goal is to keep (hot) data in memory Requires scale-out approach Each server responsible for one chunk Fast access to local data The Case for RackOut Scalable Data Serving Using
More informationDeploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c
White Paper Deploy a High-Performance Database Solution: Cisco UCS B420 M4 Blade Server with Fusion iomemory PX600 Using Oracle Database 12c What You Will Learn This document demonstrates the benefits
More informationA Case for High Performance Computing with Virtual Machines
A Case for High Performance Computing with Virtual Machines Wei Huang*, Jiuxing Liu +, Bulent Abali +, and Dhabaleswar K. Panda* *The Ohio State University +IBM T. J. Waston Research Center Presentation
More information1-1. Switching Networks (Fall 2010) EE 586 Communication and. September Lecture 10
EE 586 Communication and Switching Networks (Fall 2010) Lecture 10 September 17 2010 1-1 Announcement Send me your group and get group ID HW3 (short) out on Monday Personal leave for next two weeks No
More informationETHOS A Generic Ethernet over Sockets Driver for Linux
ETHOS A Generic Ethernet over Driver for Linux Parallel and Distributed Computing and Systems Rainer Finocchiaro Tuesday November 18 2008 CHAIR FOR OPERATING SYSTEMS Outline Motivation Architecture of
More informationSystems Architecture II
Systems Architecture II Topics Interfacing I/O Devices to Memory, Processor, and Operating System * Memory-mapped IO and Interrupts in SPIM** *This lecture was derived from material in the text (Chapter
More informationBenefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems
Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems K. VAIDYANATHAN, P. LAI, S. NARRAVULA AND D. K. PANDA Technical Report Ohio State University (OSU-CISRC-8/7-TR53)
More informationABySS Performance Benchmark and Profiling. May 2010
ABySS Performance Benchmark and Profiling May 2010 Note The following research was performed under the HPC Advisory Council activities Participating vendors: AMD, Dell, Mellanox Compute resource - HPC
More informationEvaluation of Strong Consistency Web Caching Techniques
Evaluation of Strong Consistency Web Caching Techniques Y. Cao and M.T. Özsu ({y2cao, tozsu}@uwaterloo.ca) University of Waterloo, School of Computer Science, Waterloo, Ontario, Canada N2L 3G1 Abstract.
More informationKey Measures of InfiniBand Performance in the Data Center. Driving Metrics for End User Benefits
Key Measures of InfiniBand Performance in the Data Center Driving Metrics for End User Benefits Benchmark Subgroup Benchmark Subgroup Charter The InfiniBand Benchmarking Subgroup has been chartered by
More informationA Simulation: Improving Throughput and Reducing PCI Bus Traffic by. Caching Server Requests using a Network Processor with Memory
Shawn Koch Mark Doughty ELEC 525 4/23/02 A Simulation: Improving Throughput and Reducing PCI Bus Traffic by Caching Server Requests using a Network Processor with Memory 1 Motivation and Concept The goal
More informationTowards Energy-Proportional Datacenter Memory with Mobile DRAM
Towards Energy-Proportional Datacenter Memory with Mobile DRAM Krishna Malladi 1 Frank Nothaft 1 Karthika Periyathambi Benjamin Lee 2 Christos Kozyrakis 1 Mark Horowitz 1 Stanford University 1 Duke University
More informationMM5 Modeling System Performance Research and Profiling. March 2009
MM5 Modeling System Performance Research and Profiling March 2009 Note The following research was performed under the HPC Advisory Council activities AMD, Dell, Mellanox HPC Advisory Council Cluster Center
More informationDesign and Implementation of MPICH2 over InfiniBand with RDMA Support
Design and Implementation of MPICH2 over InfiniBand with RDMA Support Jiuxing Liu Weihang Jiang Pete Wyckoff Dhabaleswar K Panda David Ashton Darius Buntinas William Gropp Brian Toonen Computer and Information
More information1/5/2012. Overview of Interconnects. Presentation Outline. Myrinet and Quadrics. Interconnects. Switch-Based Interconnects
Overview of Interconnects Myrinet and Quadrics Leading Modern Interconnects Presentation Outline General Concepts of Interconnects Myrinet Latest Products Quadrics Latest Release Our Research Interconnects
More informationEC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures
EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures Haiyang Shi, Xiaoyi Lu, and Dhabaleswar K. (DK) Panda {shi.876, lu.932, panda.2}@osu.edu The Ohio State University
More informationAccelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet
WHITE PAPER Accelerating Hadoop Applications with the MapR Distribution Using Flash Storage and High-Speed Ethernet Contents Background... 2 The MapR Distribution... 2 Mellanox Ethernet Solution... 3 Test
More informationZ RESEARCH, Inc. Commoditizing Supercomputing and Superstorage. Massive Distributed Storage over InfiniBand RDMA
Z RESEARCH, Inc. Commoditizing Supercomputing and Superstorage Massive Distributed Storage over InfiniBand RDMA What is GlusterFS? GlusterFS is a Cluster File System that aggregates multiple storage bricks
More informationMemory-Based Cloud Architectures
Memory-Based Cloud Architectures ( Or: Technical Challenges for OnDemand Business Software) Jan Schaffner Enterprise Platform and Integration Concepts Group Example: Enterprise Benchmarking -) *%'+,#$)
More information1. ALMA Pipeline Cluster specification. 2. Compute processing node specification: $26K
1. ALMA Pipeline Cluster specification The following document describes the recommended hardware for the Chilean based cluster for the ALMA pipeline and local post processing to support early science and
More informationEnhancing Checkpoint Performance with Staging IO & SSD
Enhancing Checkpoint Performance with Staging IO & SSD Xiangyong Ouyang Sonya Marcarelli Dhabaleswar K. Panda Department of Computer Science & Engineering The Ohio State University Outline Motivation and
More information2017 Storage Developer Conference. Mellanox Technologies. All Rights Reserved.
Ethernet Storage Fabrics Using RDMA with Fast NVMe-oF Storage to Reduce Latency and Improve Efficiency Kevin Deierling & Idan Burstein Mellanox Technologies 1 Storage Media Technology Storage Media Access
More informationSEDA: An Architecture for Well-Conditioned, Scalable Internet Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services Matt Welsh, David Culler, and Eric Brewer Computer Science Division University of California, Berkeley Operating Systems Principles
More informationSeminar on. By Sai Rahul Reddy P. 2/2/2005 Web Caching 1
Seminar on By Sai Rahul Reddy P 2/2/2005 Web Caching 1 Topics covered 1. Why Caching 2. Advantages of Caching 3. Disadvantages of Caching 4. Cache-Control HTTP Headers 5. Proxy Caching 6. Caching architectures
More information