VoltDB vs. Redis Benchmark

Similar documents
A Single Source of Truth

VoltDB for Financial Services Technical Overview

New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

VOLTDB + HP VERTICA. page

Architecture of a Real-Time Operational DBMS

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Memory-Based Cloud Architectures

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

Traditional RDBMS Wisdom is All Wrong -- In Three Acts. Michael Stonebraker

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

RIGHTNOW A C E

TrafficDB: HERE s High Performance Shared-Memory Data Store Ricardo Fernandes, Piotr Zaczkowski, Bernd Göttler, Conor Ettinoffe, and Anis Moussa

Software Defined Storage at the Speed of Flash. PRESENTATION TITLE GOES HERE Carlos Carrero Rajagopal Vaideeswaran Symantec

Aurora, RDS, or On-Prem, Which is right for you

April 21, 2017 Revision GridDB Reliability and Robustness

DATABASE SCALE WITHOUT LIMITS ON AWS

Whitepaper / Benchmark

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Aerospike Scales with Google Cloud Platform

PERFORMANCE CHARACTERIZATION OF MICROSOFT SQL SERVER USING VMWARE CLOUD ON AWS PERFORMANCE STUDY JULY 2018

IBM System Storage DS5020 Express

SONAS Best Practices and options for CIFS Scalability

Big and Fast. Anti-Caching in OLTP Systems. Justin DeBrabant

Isilon: Raising The Bar On Performance & Archive Use Cases. John Har Solutions Product Manager Unstructured Data Storage Team

Datacenter replication solution with quasardb

When, Where & Why to Use NoSQL?

Tailwind: Fast and Atomic RDMA-based Replication. Yacine Taleb, Ryan Stutsman, Gabriel Antoniu, Toni Cortes

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Conceptual Modeling on Tencent s Distributed Database Systems. Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

MySQL Cluster Web Scalability, % Availability. Andrew

Presented by Nanditha Thinderu

IBM InfoSphere Streams v4.0 Performance Best Practices

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Paperspace. Architecture Overview. 20 Jay St. Suite 312 Brooklyn, NY Technical Whitepaper

Flash Storage Complementing a Data Lake for Real-Time Insight

IBM MQ Appliance HA and DR Performance Report Model: M2001 Version 3.0 September 2018

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

[This is not an article, chapter, of conference paper!]

Introduction to the Active Everywhere Database

NewSQL Databases. The reference Big Data stack

MySQL & NoSQL: The Best of Both Worlds

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

NoSQL Performance Test

MongoDB on Kaminario K2

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

IBM MQ Appliance HA and DR Performance Report Version July 2016

Adobe Acrobat Connect Pro 7.5 and VMware ESX Server

Ambry: LinkedIn s Scalable Geo- Distributed Object Store

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

The Impact of SSD Selection on SQL Server Performance. Solution Brief. Understanding the differences in NVMe and SATA SSD throughput

WebSphere Application Server Base Performance

Revolutionizing the Datacenter Join the Conversation #OpenPOWERSummit

Building Consistent Transactions with Inconsistent Replication

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014!

Upgrade Your MuleESB with Solace s Messaging Infrastructure

Making RAMCloud Writes Even Faster

EMC Business Continuity for Microsoft Applications

Introduction to Database Services

Introduction to Amazon Web Services

THE DEFINITIVE GUIDE FOR AWS CLOUD EC2 FAMILIES

EMC XTREMCACHE ACCELERATES MICROSOFT SQL SERVER

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Arachne. Core Aware Thread Management Henry Qin Jacqueline Speiser John Ousterhout

CGAR: Strong Consistency without Synchronous Replication. Seo Jin Park Advised by: John Ousterhout

ApsaraDB for Redis. Product Introduction

Massive Scalability With InterSystems IRIS Data Platform

Evaluation Report: Improving SQL Server Database Performance with Dot Hill AssuredSAN 4824 Flash Upgrades

(incubating) Introduction. Swapnil Bawaskar.

Running Databases in Containers.

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems

Solace JMS Broker Delivers Highest Throughput for Persistent and Non-Persistent Delivery

Meltdown and Spectre Interconnect Performance Evaluation Jan Mellanox Technologies

Oracle Database 12c: JMS Sharded Queues

Oracle NoSQL Database Overview Marie-Anne Neimat, VP Development

Managing IoT and Time Series Data with Amazon ElastiCache for Redis

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

Next-Generation Cloud Platform

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Virtualizing SQL Server 2008 Using EMC VNX Series and VMware vsphere 4.1. Reference Architecture

MySQL High Availability Solutions. Alex Poritskiy Percona

ADDENDUM TO: BENCHMARK TESTING RESULTS UNPARALLELED SCALABILITY OF ITRON ENTERPRISE EDITION ON SQL SERVER

EMC Backup and Recovery for Microsoft SQL Server

Acano solution. White Paper on Virtualized Deployments. Simon Evans, Acano Chief Scientist. March B

CLUSTERING HIVEMQ. Building highly available, horizontally scalable MQTT Broker Clusters

Microsoft SQL Server on Stratus ftserver Systems

Quobyte The Data Center File System QUOBYTE INC.

Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects?

JVM Performance Study Comparing Java HotSpot to Azul Zing Using Red Hat JBoss Data Grid

EMC Business Continuity for Microsoft SharePoint Server (MOSS 2007)

PARDA: Proportional Allocation of Resources for Distributed Storage Access

HCI: Hyper-Converged Infrastructure

EBOOK. FROM DISASTER RECOVERY TO ACTIVE-ACTIVE: NuoDB AND MULTI-DATA CENTER DEPLOYMENTS

Carbonite Availability. Technical overview

Nutanix Tech Note. Virtualizing Microsoft Applications on Web-Scale Infrastructure

Building Consistent Transactions with Inconsistent Replication

This project must be done in groups of 2 3 people. Your group must partner with one other group (or two if we have add odd number of groups).

Transcription:

Volt vs. Redis Benchmark Motivation and Goals of this Evaluation Compare the performance of several distributed databases that can be used for state storage in some of our applications Low latency is expected Persistence on disk is not required The comparison of the features of these databases is not covered in this presentation This is part of a technology investigation, not a product decision Popular tools such as YCSB (Yahoo! Cloud Serving Benchmark) focus on typical cloud workloads for key-value stores and NoSQL databases: access by primary key, no locking. But we want to compare the databases under a more demanding scenario Data Record Operation Used in this Test Our test scenario: Requires strict locking (opportunistic concurrency control is not allowed); Makes most database requests using secondary keys (a.k.a. alternate keys) instead of the primary key for each record. Partitioning is done on the primary key, so the secondary keys do not match the partitioning scheme. The following sequence (operation) is repeated by all test clients: 1. Select a secondary key 2. LockAndRead : Lock and read the record associated with that secondary key 3. Wait : Simulate the interaction with external nodes (delay from 1 to 5ms) to update the data 4. WriteAndUnlock : Update the modified record in the database and release the lock 5. Wait (1ms to 5ms) before accessing the same record again 4 times, then another record The time that we measure for each operation covers steps 1 to 4, including the wait of ~3ms.

Test Setup and Test Procedure BENCHMARK TOOL DATABASE CLIENT PARTITION 1 CONTROL SERVER PARTITION 2 CLIENT CLIENT PARTITION N Figure 1 Test Setup 3 client hosts and 3 server hosts, HP DL360 Gen6 2 quad-core CPUs: Intel Xeon E5540 @ 2.53GHz for a total of 16 hyperthreads (8 real cores) 72 GB RAM 2 * 10 Gbit + 2 * 1 Gbit Ethernet adapters, although most traffic goes through the 1st adapter connection to a dedicated switch (full non-blocking 10 Gbit) Test Procedure The control server launches several clients that measure several statistics and generate increasing load on the database until too many errors occur Many combinations of parameters are tested: number of clients, payload size, burstiness of the requests, rate of load increase, etc. Databases: Volt (Enterprise and Community editions), Redis and others. 2 BENCHMARKS VOLT VS. REDIS

Technical Procedure and Results Test procedure and results (1/2) The control server launches several client processes on the client hosts Each client generates requests to the and measures several statistics that are reported to the control server The load is progressively increased by all clients until too many errors occur Test procedure and results (2/2) Each test is repeated 30 to 90 times over 6-12 hours, increasing the number of clients and the payload size: 1 to 21 clients (21 clients = 7 per host; hosts have 8 physical cores) 100 to 4000 bytes of payload Requests sent in bursts of 10ms, 20ms, 50ms or more Graphs are generated after each test Around 130 to 300 graphs per series of tests Available on request if you are interested Figure 2 Databases Covered in this Report Volt, clients in C++ and Go Enterprise edition and Community edition We did not use command logging or other features Several partitioning setups were tested 14 sites per host (42 partitions) seemed to give the best results Redis, clients in C and Go Several other databases not disclosed here, with clients in C++, Java or Go 8 clients in total, plus 2 from older tests 3 BENCHMARKS VOLT VS. REDIS

Understanding the Throughput/Latency Results Left Y axis =Throughput Orange line + dots = total number of operations per second for all clients, increasing over time One operation = - lock + read, - wait, - write + unlock = several requests Right Y axis =Total delay Maximum delay out of range because graph is scaled on 90 th percentile 80 th percentile of delays 60 th percentile of delays Solid line = median = 50 th percentile of delays 40 th percentile of delays 20 th percentile of delays Minimum delay Figure 3 Test Results Volt Volt Results, No Replication 6 clients Test done without replication in the database 6 C++ clients The total delay (including average 3 ms wait step) is rather stable as the load increases An analysis of the CPU load shows that the client side is the limit, so we need more clients Figure 4: Ops/s and delay for 6 clients on 3 hosts, 100 bytes payload Volt 14 sites per host 4 BENCHMARKS VOLT VS. REDIS

Volt Results, No Replication 18 clients Test done without replication in the database 18 C++ clients Latency remains good until about 225K ops/s Maximum throughput around 350K ops/s (800K database transactions per second) when the average latency exceeds 20 ms Figure 5: Ops/s and delay for 18 clients on 3 hosts, 100 bytes per payload Volt 14 sites per host Volt Results, No Replication N clients With 3 or 6 clients, the limits in our tests come from the client side (maximum load on a few CPU cores) The results are very similar if the load comes from 9 or 21 clients: The maximum throughput is slightly below 350K ops/s Until around 225K ops/s, the average latency remains under 7ms ( comfort zone ) Figure 6: Influence of the number of clients on latency (payload: 100 bytes) 5 BENCHMARKS VOLT VS. REDIS

Volt Results, with Replication 18 clients Test done with replication factor 1 in the database 18 C++ clients Latency remains good until about 73K ops/s Maximum throughput around 105K ops/s (240K database transactions per second) when average latency exceeds 20 ms Figure 7: Ops/s and delay for 18 clients on 3 hosts, 100 bytes payload Volt (C++) repl, 14x3 sites Volt Results, with Replication N clients Test done with replication factor 1 in the database The results are very similar regardless of the number of clients: The maximum throughput is slightly above 120K ops/s Until around 70K ops/s, the average latency remains under 7ms The limit is the server CPU even with 3 clients Figure 8: Influence of the number of clients on latency (payload: 100 bytes) 6 BENCHMARKS VOLT VS. REDIS

Comparison with Other Databases Redis Results, with Replication N clients Replication factor 1, asynchronous Redis client written in Go Similar results for C client Latency lower than Volt but uses asynchronous replication: risk of inconsistencies in case of node or network failures Figure 9: Influence of the number of clients on latency (payload: 100 bytes) Database NNN, with Replication N clients Replication factor 1, synchronous Client written in Go Maximum throughput is 20-25% lower than Volt or Redis Latency increases too quickly and is already above 10ms for a moderate load Figure 10: Influence of the number of clients on latency (payload: 100 bytes) 7 BENCHMARKS VOLT VS. REDIS

Technical Summary Volt performance: Throughput comparable to Redis when replication is enabled, better without replication Better than the other databases that we tested Replication has a significant influence on the performance: throughput divided by 3 approx. Synchronous replication is slower but safer than asynchronous replication. It reduces the risk of inconsistencies in case of network failures. Client libraries are available for Java, C++, Go, à à Slightly different programming models used for each language (e.g., Java and Go rely on threads, C++ uses a single thread for I/O) 8 BENCHMARKS VOLT VS. REDIS

Executive Summary Good support received from Volt For all databases studied, the best results are obtained by adapting slightly the business logic of the application to take advantage of what each database does best: stored procedures, lightweight transactions, different partitioning models, etc. Comparison of features is out of scope for this presentation: replication, scalability, transaction capabilities, management and monitoring, recovery after failure, etc. Volt achieved performance and scale without sacrificing consistency Distributed data and computing for scaling performance, linear to allocated resources (CPU cores) True high availability with no data loss upon faults and failures within the cluster Synchronous durability to prevent data loss upon full cluster failure Disaster recovery for establishing failover clusters Cross DataCenter Replication (XDCR) for routing geo-local traffic with conflict resolution/notification Stored Procedures that allow complex logic to be implemented or imported in a simple Java class construct in less than 100 lines of code allowing ease of maintenance Co-locating code and data allows scale guaranteeing predictably low latency for a connected API driven data flow Import and export with pre-built connectors and extensibility for custom connectors Variety of client libraries (Java, C++, Golang etc.) Non-stop availability strategies throughout planned upgrade and maintenance activities Virtualization and Containerization ready Great support and a partner approach based engineering team About Volt Volt is the only in-memory transactional database for modern applications that require an unprecedented combination of data scale, volume, and accuracy. Unlike other databases, including OLTP, Big Data, and NoSQL, that force users to compromise, only Volt supports all three modern application data requirements: 1. Millions Volt processes a relentless volume of data points from users and data sources. 2. Milliseconds Volt ingests, analyzes, and acts on data in less than the blink of an eye. 3. 100% Data managed by Volt is always accurate, all the time, for all decisions. Telcos, Financial services, Ad Tech, Gaming, and other companies use Volt to modernize their applications. Volt is preparing energy, industrial, telco and other companies to meet the challenges of the IoT. Volt was founded by a team of world-class database experts, including Dr. Michael Stonebraker, winner of the coveted ACM Turing award. 5April2018 Volt, Inc. 209 Burlington Road, Suite 203, Bedford, MA 01730 voltdb.com