Exploring the scalability of RPC with oslo.messaging

Similar documents
OpenStack internal messaging at the edge: In-depth evaluation. Ken Giusti Javier Rojas Balderrama Matthieu Simonin

RabbitMQ: Messaging in the Cloud

Interactive Responsiveness and Concurrent Workflow

Database Architectures

Architecture and terminology

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

Congestion Control in TCP

Next-Generation AMQP Messaging Performance, Architectures, and Ecosystems with Red Hat Enterprise MRG. Bryan Che MRG Product Manager Red Hat, Inc.

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

OS and Hardware Tuning

OS and HW Tuning Considerations!

Congestion Control in Datacenters. Ahmed Saeed

Multiprocessor Support

Process Scheduling. Copyright : University of Illinois CS 241 Staff

MRG - AMQP trading system in a rack. Carl Trieloff Senior Consulting Software Engineer/ Director MRG Red Hat, Inc.

Congestion Control in TCP

Graphical Analysis. Figure 1. Copyright c 1997 by Awi Federgruen. All rights reserved.

Time and Space. Indirect communication. Time and space uncoupling. indirect communication

Computer Networks Project 4. By Eric Wasserman and Ji Hoon Baik

Storm. Distributed and fault-tolerant realtime computation. Nathan Marz Twitter

DATABASE SCALE WITHOUT LIMITS ON AWS

Sandor Heman, Niels Nes, Peter Boncz. Dynamic Bandwidth Sharing. Cooperative Scans: Marcin Zukowski. CWI, Amsterdam VLDB 2007.

Distributed Systems. Characteristics of Distributed Systems. Lecture Notes 1 Basic Concepts. Operating Systems. Anand Tripathi

Distributed Systems. Characteristics of Distributed Systems. Characteristics of Distributed Systems. Goals in Distributed System Designs

On BigFix Performance: Disk is King. How to get your infrastructure right the first time! Case Study: IBM Cloud Development - WW IT Services

Produced by. Design Patterns. MSc Computer Science. Eamonn de Leastar

Cluster-Based Scalable Network Services

Congestion Control in TCP

UV Mapping to avoid texture flaws and enable proper shading

Web Serving Architectures

JVM and application bottlenecks troubleshooting

Future-ready IT Systems with Performance Prediction using Analytical Models

VMware vcloud Director Configuration Maximums vcloud Director 9.1 and 9.5 October 2018

Indirect Communication

Database Architectures

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols. Broch et al Presented by Brian Card

CS 5520/ECE 5590NA: Network Architecture I Spring Lecture 13: UDP and TCP

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Microservice-Based Agile Architectures:

An Implementation of the Homa Transport Protocol in RAMCloud. Yilong Li, Behnam Montazeri, John Ousterhout

Data Mining, Parallelism, Data Mining, Parallelism, and Grids. Queen s University, Kingston David Skillicorn

Chapter 2: Understanding Data Distributions with Tables and Graphs

Distributed Systems Exam 1 Review Paul Krzyzanowski. Rutgers University. Fall 2016

Listening with TSDE (Transport Segment Delay Estimator) Kathleen Nichols Pollere, Inc.

Introduction to Internet of Things Prof. Sudip Misra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur

ECEN Final Exam Fall Instructor: Srinivas Shakkottai

Networking Strategy and Optimization Services (NSOS) 2010 IBM Corporation

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Efficient On-Demand Operations in Distributed Infrastructures

Cloud Scale IoT Messaging

CS 856 Latency in Communication Systems

Profile of CopperEye Indexing Technology. A CopperEye Technical White Paper

Performance Evaluation of Virtualization Technologies

File System Performance (and Abstractions) Kevin Webb Swarthmore College April 5, 2018

Congestion control in TCP

Distributing OpenStack on top of a Key/Value store

TCP Strategies. Keepalive Timer. implementations do not have it as it is occasionally regarded as controversial. between source and destination

Chapter 17: Parallel Databases

Lixia Zhang M. I. T. Laboratory for Computer Science December 1985

Graphs of Exponential

Contents Overview of the Performance and Sizing Guide... 5 Architecture Overview... 7 Performance and Scalability Considerations...

Scalable Enterprise Networks with Inexpensive Switches

Studying Fairness of TCP Variants and UDP Traffic

Outline. Internet. Router. Network Model. Internet Protocol (IP) Design Principles

3.7. Vertex and tangent

COSC243 Part 2: Operating Systems

Introduction to Ethernet Latency

Most real programs operate somewhere between task and data parallelism. Our solution also lies in this set.

Network Processors and their memory

Short-Cut MCMC: An Alternative to Adaptation

How Much Logic Should Go in an FPGA Logic Block?

QoS on Low Bandwidth High Delay Links. Prakash Shende Planning & Engg. Team Data Network Reliance Infocomm

Lightstreamer. The Streaming-Ajax Revolution. Product Insight

Lecture 24: Board Notes: Cache Coherency

Instant Integration into the AMQP Cloud with Apache Qpid Messenger. Rafael Schloming Principle Software Red Hat

Distributed Data Infrastructures, Fall 2017, Chapter 2. Jussi Kangasharju

Parallel Programming Principle and Practice. Lecture 10 Big Data Processing with MapReduce

Ms Nurazrin Jupri. Frequency Distributions

Performance Benchmarking an Enterprise Message Bus. Anurag Sharma Pramod Sharma Sumant Vashisth

Impact of transmission errors on TCP performance. Outline. Random Errors

A Performance Study of Locking Granularity in Shared-Nothing Parallel Database Systems

EqualLogic Storage and Non-Stacking Switches. Sizing and Configuration

COSC 311: ALGORITHMS HW1: SORTING

Deploying VSaaS and Hosted Solutions using CompleteView

NIC TEAMING IEEE 802.3ad

2 Test Setup Server Details Database Client Details Details of test plan... 5

Corda Performance To infinity and beyond! James Carlyle Chief Engineer, R3 7 March 2018

CS-736 Midterm: Beyond Compare (Spring 2008)

Diagnosing the cause of poor application performance

z/os Heuristic Conversion of CF Operations from Synchronous to Asynchronous Execution (for z/os 1.2 and higher) V2

Routing Metric. ARPANET Routing Algorithms. Problem with D-SPF. Advanced Computer Networks

Distributed Systems 27. Process Migration & Allocation

Low Latency Data Grids in Finance

1588v2 Performance Validation for Mobile Backhaul May Executive Summary. Case Study

New features in JustCGM 5.1

Healthcare IT A Monitoring Primer

IBM BigFix Lifecycle 9.5

CIS 632 / EEC 687 Mobile Computing

Can "scale" cloud applications "on the edge" by adding server instances. (So far, haven't considered scaling the interior of the cloud).

How To Construct A Keyword Strategy?

Transcription:

Exploring the scalability of RPC with oslo.messaging

Rpc client Rpc server Request Rpc client Rpc server Worker Response Worker Messaging infrastructure Rpc client Rpc server Worker Request-response Each request goes to one of the servers Worker includes a client and a server Clients make requests concurrently Clients record throughput and latency Test fails if response not received within 2 seconds of request

Disclaimer There are lots of aspects of scale and lots of different use cases or variations that could be explored. This is an initial experiment that I hope provides some food for thought. It most certainly should not be considered comprehensive or conclusive.

Code for test: https://github.com/grs/ombt

2 machines for servers, both with 4 cores, both running Fedora 19 4 machines for clients, 2 with 12 cores, 2 with 16 cores, all running RHEL7 RabbitMQ 3.1.5 Qpidd.28 Qpid Dispatch.2 (with patch) Oslo.messaging from https://github.com/flaper87/oslo.messaging/tree/ gordon/

35 This graph shows how the average rate of requests on the vertical axis - varies as we increase the number of workers for different configurations i.e. As we move right along the horizontal axis. 3 25 RPCs per second per 'worker' 2 15 1 5 AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver The cut-off in each case is the point at which the response time of any request is above 2 seconds. 2 4 6 8 1 12

35 3 25 The next graph just focuses in on a smaller 'slice' of the horizontal axis... RPCs per second per 'worker' 2 15 1 AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver 5 2 4 6 8 1 12

The point at which we start on the vertical axis is the maximum request-response rate and is latency sensitive. 35 3 As we increase the workers, the rate of each client drops off. The rate of degradation and the point at it starts are key measures of scalability. RPCs per second per 'worker' 25 2 15 1 AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver qpid driver with extended timeout 5 1 2 3 4 5 6

35 3 For the two configurations using the AMQP 1. driver, there is minimal degradation until we get to about 2 workers. RPCs per second per 'worker' 25 2 15 1 For the rabbit driver, the degradation is more immediate. AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver qpid driver with extended timeout For the qpid driver, degradation is 5 somewhere in between that for the other two drivers, but we start significantly lower. Why? 1 2 3 4 5 6

RPCs per second per 'worker' The configuration these two lines represent are using the exact same broker. The only thing that is different is the driver (and the client library it uses). 35 3 25 2 15 1 5 1 2 3 4 5 6 In fact the cutoff first observed for the qpid driver was due to this extra latency. On the same machine as the broker, with reduced latency but also reduced CPU available, the cutoff happens much later. Increasing the timeout marginally shows a more accurate picture of degradation from a bad starting point. (The same increase doesn't affect the cutoff for the other drivers). AMQP 1. driver with qpidd qpid driver qpid driver with extended timeout RPCs per second per 'worker' 35 3 25 2 15 1 5 The qpid driver has very poor latency due to extra (unnecessary) synchronous roundtrips arising from: (a) creating a sender for every message (and querying the node type in each case) (b) using a synchronous send (both for request and response) 2 4 6 8 1 12 AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver qpid driver with extended timeout

1 9 8 This graph shows the growth in aggregate request rate i.e. The overall rate of requests through the broker from all the clients together 7 Combined RPCs per second 6 5 4 Initially the aggregate rate increases as we add workers. Eventually we reach a point beyond which the rate cannot increase. Additional workers then tends to reduce the overall rate. AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver qpid driver with extended timeout 3 2 The maximum aggregate rate and the point at which it is reached are also key aspects of scalability. 1 2 4 6 8 1 12

1 9 Again, we will focus in on a smaller 'slice' of the horizontal axis to better see some of the detail... 8 7 Combined RPCs per second 6 5 4 AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver qpid driver with extended timeout 3 2 1 2 4 6 8 1 12

1 The configurations using the AMQP 1. driver, increase fairly linearly with the number of workers up to about 2 workers, tailing off after 3 workers or so. The maximum aggregate rate is significantly higher than either of the other two drivers. 9 8 7 Combined RPCs per second 6 5 4 3 2 The rabbit driver shows a linear increase in aggregate rate up to about 6 or 7 workers, and flattens out at about 1 workers. AMQP 1. driver with dispatch AMQP 1. driver with qpidd rabbit driver qpid driver qpid driver with extended timeout 1 1 2 3 4 5 6 7 8 9 1 The qpid driver shows much slower growth than the other drivers,but the increase continues to about 3 workers.

What are the limits, and can we get round them?

RPCs per second per 'worker' 35 3 25 2 15 1 5 12 1 2 3 4 5 6 1. limited number of workers we can support at the maximum rate of requests Combined RPCs per second 1 8 6 4 2 1 2 3 4 5 6 2. limited achievable aggregate rate RPCs per second per 'worker' 3 25 2 15 1 5 2 4 6 8 1 12 3. limited number of workers we can support while staying within the defined maximum response time

RPCs per second per 'worker' 35 3 25 2 15 1 5 12 1 2 3 4 5 6 1. Can we delay the point at which the average rate seen by each worker begins to degrade? Combined RPCs per second 1 8 6 4 2 1 2 3 4 5 6 2. Can we keep increasing the aggregate rate? RPCs per second per 'worker' 3 25 2 15 1 5 2 4 6 8 1 12 3. Can we extend the scale at which we can keep within a given maximum timeout?

Need to go beyond the limits of a single intermediating process

Average RPC's per second, per 'worker' 35 3 25 2 15 1 5 1 2 3 4 5 6 7 8 9 1 Number of 'workers' Rabbit driver, 1 node cluster Rabbit driver, 2 node cluster AMQP 1. driver, 1 Qpid Dispatch router AMQP 1. driver, 2 Qpid Dispatch routers The point of significant degradation was postponed with the AMQP 1. driver and a Qpid Dispatch Router pair, though the line is not quite flat. With the rabbit driver an A RabbitMQ cluster pair the curve was shifted right a little, but there was no alteration in basic shape.

Combined RPCs per second 18 16 14 12 1 8 6 4 2 Rabbit driver, 1 node cluster Rabbit driver, 2 node cluster AMQP 1. driver, 1 Qpid Dispatch router AMQP 1. driver, 2 Qpid Dispatch routers 1 2 3 4 5 6 7 8 9 1 Number of 'workers' Combined RPCs per second Combined RPCs per second 2 15 1 5 1 2 3 4 5 6 7 8 9 1 3 25 2 15 1 5 Number of 'workers' 1 2 3 4 5 6 7 8 9 1 Number of 'workers' AMQP 1. driver, 1 Qpid Dispatch router AMQP 1. driver, 2 Qpid Dispatch routers Extended the maximum achievable aggregate rate both with rabbit driver and clustered RabbitMQ and for AMQP 1. driver and Qpid Dispatch Router network. Rabbit driver, 1 node cluster Rabbit driver, 2 node cluster

Average RPC's per second, per 'worker' 35 3 25 2 15 1 5 2 4 6 8 1 12 14 16 18 2 Number of 'workers' Rabbit driver, 1 node cluster Rabbit driver, 2 node cluster AMQP 1. driver, 1 Qpid Dispatch router AMQP 1. driver, 2 Qpid Dispatch routers Extended the number of workers we can support while staying within the maximum allowed response time both with rabbit driver and clustered RabbitMQ and for AMQP 1. driver and Qpid Dispatch Router network.