NoSQL Performance Test

Similar documents
New Oracle NoSQL Database APIs that Speed Insertion and Retrieval

Accelerating Big Data: Using SanDisk SSDs for Apache HBase Workloads

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

BENCHMARK: PRELIMINARY RESULTS! JUNE 25, 2014!

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

NVMe SSDs Future-proof Apache Cassandra

4th National Conference on Electrical, Electronics and Computer Engineering (NCEECE 2015)

Architecture of a Real-Time Operational DBMS

RIGHTNOW A C E

VoltDB vs. Redis Benchmark

MongoDB on Kaminario K2

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

NoSQL BENCHMARKING AND TUNING. Nachiket Kate Santosh Kangane Ankit Lakhotia Persistent Systems Ltd. Pune, India

Course Content MongoDB

SAP SD Benchmark with DB2 and Red Hat Enterprise Linux 5 on IBM System x3850 M2

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Accelerating Microsoft SQL Server Performance With NVDIMM-N on Dell EMC PowerEdge R740

Upgrade to Microsoft SQL Server 2016 with Dell EMC Infrastructure

NOSQL DATABASE PERFORMANCE BENCHMARKING - A CASE STUDY

What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed?

HIGH PERFORMANCE SANLESS CLUSTERING THE POWER OF FUSION-IO THE PROTECTION OF SIOS

Service Oriented Performance Analysis

Performance Evaluation of NoSQL Databases

Aerospike Scales with Google Cloud Platform

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Abstract. The Challenges. ESG Lab Review InterSystems IRIS Data Platform: A Unified, Efficient Data Platform for Fast Business Insight

Oracle NoSQL Database and Cisco- Collaboration that produces results. 1 Copyright 2011, Oracle and/or its affiliates. All rights reserved.

Hewlett Packard Enterprise HPE GEN10 PERSISTENT MEMORY PERFORMANCE THROUGH PERSISTENCE

Performance of popular open source databases for HEP related computing problems

VoltDB for Financial Services Technical Overview

LATEST INTEL TECHNOLOGIES POWER NEW PERFORMANCE LEVELS ON VMWARE VSAN

HP SAS benchmark performance tests

Whitepaper / Benchmark

The Leading Parallel Cluster File System

vrealize Business System Requirements Guide

InfoBrief. Dell 2-Node Cluster Achieves Unprecedented Result with Three-tier SAP SD Parallel Standard Application Benchmark on Linux

Evaluation Report: HP StoreFabric SN1000E 16Gb Fibre Channel HBA

High Performance NoSQL with MongoDB

Dell Microsoft Business Intelligence and Data Warehousing Reference Configuration Performance Results Phase III

Fluentd + MongoDB + Spark = Awesome Sauce

MDHIM: A Parallel Key/Value Store Framework for HPC

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

TPC-E testing of Microsoft SQL Server 2016 on Dell EMC PowerEdge R830 Server and Dell EMC SC9000 Storage

Benefits of Automatic Data Tiering in OLTP Database Environments with Dell EqualLogic Hybrid Arrays

Oracle VM 3.3. Planning and Implementing

Accelerating Enterprise Search with Fusion iomemory PCIe Application Accelerators

Lenovo Database Configuration

The PowerEdge M830 blade server

Apache Flink: Distributed Stream Data Processing

WHITE PAPER AGILOFT SCALABILITY AND REDUNDANCY

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

MongoDB - a No SQL Database What you need to know as an Oracle DBA

The course modules of MongoDB developer and administrator online certification training:

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

EMC XTREMCACHE ACCELERATES VIRTUALIZED ORACLE

Storage Consolidation with the Dell PowerVault MD3000i iscsi Storage

ORACLE Linux / TSC.

Performance Benefits of Running RocksDB on Samsung NVMe SSDs

MySQL & NoSQL: The Best of Both Worlds

[This is not an article, chapter, of conference paper!]

HP ProLiant delivers #1 overall TPC-C price/performance result with the ML350 G6

Oracle NoSQL Database

An Oracle Technical White Paper October Sizing Guide for Single Click Configurations of Oracle s MySQL on Sun Fire x86 Servers

IBM Cognitive Systems Cognitive Infrastructure for the digital business transformation

MongoDB Essentials - Level 2. Description. Course Duration: 2 Days. Course Authored by CloudThat

Enhancing Analysis-Based Design with Quad-Core Intel Xeon Processor-Based Workstations

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

QuickSpecs. Models SATA RAID Controller HP 6-Port SATA RAID Controller B21. HP 6-Port SATA RAID Controller. Overview.

Scaling Source Control for the Next Generation of Game Development

MongoDB Introduction and Red Hat Integration Points. Chad Tindel Solution Architect

DEDICATED SERVERS WITH EBS

Cloudian Sizing and Architecture Guidelines

SRM-Buffer: An OS Buffer Management SRM-Buffer: An OS Buffer Management Technique toprevent Last Level Cache from Thrashing in Multicores

Introduction to the Active Everywhere Database

QLE10000 Series Adapter Provides Application Benefits Through I/O Caching

Cloud Computing with FPGA-based NVMe SSDs

Altair OptiStruct 13.0 Performance Benchmark and Profiling. May 2015

Technical Paper. Performance and Tuning Considerations for SAS on the Hitachi Virtual Storage Platform G1500 All-Flash Array

Microsoft SQL Server in a VMware Environment on Dell PowerEdge R810 Servers and Dell EqualLogic Storage

Accelerate MySQL for Demanding OLAP and OLTP Use Case with Apache Ignite December 7, 2016

Windows Servers In Microsoft Azure

Leading Performance for Oracle Applications? John McAbel Collaborate 2015

Veritas NetBackup on Cisco UCS S3260 Storage Server

OpenMPDK and unvme User Space Device Driver for Server and Data Center

Lenovo Enterprise Portfolio

How to Scale Out MySQL on EC2 or RDS. Victoria Dudin, Director R&D, ScaleBase

IBM s Data Warehouse Appliance Offerings

Oracle Database 12c: JMS Sharded Queues

Dell PowerEdge R730xd Servers with Samsung SM1715 NVMe Drives Powers the Aerospike Fraud Prevention Benchmark

Future-Proofing MySQL for the Worldwide Data Revolution

Exchange Server 2007 Performance Comparison of the Dell PowerEdge 2950 and HP Proliant DL385 G2 Servers

From BigBench to TPCx-BB: Standardization of a Big Data Benchmark

Dell EMC SAP HANA Appliance Backup and Restore Performance with Dell EMC Data Domain

Intro Cassandra. Adelaide Big Data Meetup.

Technical Brief: Specifying a PC for Mascot

Big Servers 2.6 compared to 2.4

Proceedings of the Linux Symposium

Total Cost of Ownership: Database Software and Support

Time-Series Data in MongoDB on a Budget. Peter Schwaller Senior Director Server Engineering, Percona Santa Clara, California April 23th 25th, 2018

Transcription:

bankmark UG (haftungsbeschränkt) Bahnhofstraße 1 9432 Passau Germany www.bankmark.de info@bankmark.de T +49 851 25 49 49 F +49 851 25 49 499 NoSQL Performance Test In-Memory Performance Comparison of SequoiaDB, Cassandra, and MongoDB White Paper by bankmark UG (haftungsbeschränkt) December 214

CONTENTS 1 Introduction... 3 2 Results Summary... 3 3 Hardware and Software Configuration... 4 3.1 Cluster Hardware... 4 3.2 Cluster Software... 4 4 Setup Procedure... 5 4.1 Cluster Kernel Parameters... 5 4.2 Apache Cassandra... 5 4.3 MongoDB... 5 4.4 SequoiaDB... 5 4.5 YCSB... 6 5 Benchmark Setup... 6 5.1 Guidelines / Procedures... 7 5.2 Configuration Matrix... 7 6 Benchmark Results... 8 6.1 Test Case I (2 Million Records / 2 Million Records per Node)... 8 6.2 Test Case II (1 Million Records / 1 Million Records per Node)... 1 7 About the Authors... 12 2

1 INTRODUCTION In view of the fast development of innovative IT-technologies, NoSQL technology is increasingly utilized in big data and real-time web application in recent years. Because NoSQL stores allow for a more agile development process and execution, they can replace traditional relational database management systems (RDBMS) in a large number of industrial application fields. NoSQL technology significantly improves both database scalability and usability by softening RDBMS features, such as consistency and relational model. In this report, bankmark reports on a large series of benchmark experiments to compare publicly available NoSQL store products with SequoiaDB in in different workload scenarios. For this purpose, the bankmark team used the Yahoo Cloud Serving Benchmark (YCSB) suite as testing platform. The bankmark team used preset settings for all systems wherever possible and only adapted settings that caused major performance bottlenecks. For all databases official documentation as well as information from other publicly available sources was utilized. All major adaptations are documented in this report, a full report is available on request that contains all configuration settings. In the present report, bankmark focused on the performance of each database for different use cases and ensured a maximum of comparability between different results. One aim of the experiments was to get out-of-the-box performance. On the other hand, the distributed environment required some amount of optimization to get the systems running in a clustered environment. All systems were configured for clustered use and some optimization regarding partitioning / sharding took place to get competitive results for all systems. All tests were implemented by the bankmark team. All important details concerning the physical environment and the testing settings are specified in this testing report, a full report ensuring repeatability of all experiments is available on request. 2 RESULTS SUMMARY In our experiments, three systems were compared, SequoiaDB 1, Cassandra 2, and MongoDB 3. All systems were tested on a 1 node cluster in an in-memory (raw data size ¼ of total RAM) or mostly-in-memory (raw data size ½ of total RAM) setup. We used the widely accepted YCSB suite as benchmarking platform. In all experiments, all data was replicated 3 times for fault tolerance. The workloads tested were all using skewed workloads (with Zipfian or latest distribution). The detailed configuration can be seen below and in an extended report that is available on request. The results do not show a clear winner across all experiments. Our tests with the mostly-in-memory setup show that Cassandra uses most memory and thus has to perform much more disk I/O in read heavy workloads, leading to a highly decreased performance. In this case, SequoiaDB outperforms the other systems in most cases, except for write heavy workloads, which are dominated by Cassandra. In a pure in memory setup (raw data size is ¼ of total RAM), the performance of Cassandra and SequoiaDB with SequoiaDB being faster for read requests and Cassandra being faster for write request. 1 http://www..com/en/ 2 http://.apache.org/ 3 http://www..org/ 3

3 HARDWARE AND SOFTWARE CONFIGURATION In this section, the software and hardware used in all experiments will be described. The tests were performed on a cluster that was provided by SequoiaDB. All experiments were executed on physical hardware without any virtualization layer. The base system as well as the NoSQL systems were installed by trained professionals. bankmark had full root access to the cluster and reviewed all settings. 3.1 CLUSTER HARDWARE All experiments were performed on a 1 node cluster (five Dell PowerEdge R52 servers and five Dell PowerEdge R72 servers) for the database system and five HP ProLiant BL465c blades as YCSB clients. The hardware configuration is listed below: 3.1.1 5x Dell PowerEdge R52 (server) 1x Intel Xeon E5-242, 6 cores/12 threads, 1.9 GHz 47 GB RAM 6x 2 TB HDD, JBOD 3.1.2 5x Dell PowerEdge R72 (server) 1x Intel Xeon E5-262, 6 cores/12 threads, 2. GHz 47 GB RAM 6x 2 TB HDD, JBOD 3.1.3 5x HP ProLiant BL465c (clients) 1x AMD Opteron 2378 4 GB RAM 3 GB logical HDD on a HP Smart Array E2i Controller, RAID 3.2 CLUSTER SOFTWARE The cluster consists of Dell PowerEdge R52, Dell PowerEdge R72 and HP ProLiant BL465c blades as physical systems, all of which are equipped with different software. All information concerning the software in use and the corresponding software versions are listed below. 3.2.1 Dell PowerEdge R52 and R72 (used as server) OS: Red Hat Enterprise Linux Server 6.4 Architecture: x86_64 Kernel: 2.6.32 Apache Cassandra: 2.1.2 MongoDB: 2.6.5 SequoiaDB: 1.8 YCSB:.1.4 master (brianfrankcooper version at Github) with bankmark changes (see 4.5) 3.2.2 HP ProLiant BL465c (used as client) OS: SUSE Linux Enterprise Server 11 Architecture: x86_64 Kernel: 3..13 YCSB:.1.4 master (brianfrankcooper version at Github) with bankmark changes (see 4.5) 4

4 SETUP PROCEDURE Three systems were benchmarked using YCSB, namely Apache Cassandra, MongoDB and SequoiaDB. In the following sections, it is described how those systems were installed. The systems running at the cluster were tested with a replication factor of three and usage of three separate disks. Compression was activated if the system had support for it. 4.1 CLUSTER KERNEL PARAMETERS The following parameters where changed for all systems: vm.swappiness = vm.dirty_ratio = 1 vm.dirty_background_ratio = 4 vm.dirty_expire_centisecs = 3 vm.vfs_cache_pressure = 2 vm.min_free_kbytes = 3949963 vm.max_map_count = 13172 4.2 APACHE CASSANDRA Apache Cassandra was installed according to the official documentation 4 on all servers. It was configured with the recommended production settings 5. Commit log and data were assigned to different disks (disk1 for the commit log and disk 5 and disk 6 for data). 4.3 MONGODB MongoDB was installed by trained professionals. To use all three data disks and to perform replication on the cluster, a complex schema was implemented on the systems that follows the official documentation for cluster setups 6. Config servers were started on three of the cluster nodes. On all ten servers, a mongos instance (for sharding) was started. Each shard was added to the cluster. To use all three disks and to have three replicas, ten replica sets were distributed according to the following table (columns are cluster nodes): node1 node2 node3 node4 node5 node6 node7 node8 node9 node1 disk3 dg dg dg dg1 dg1 dg1 dg2 dg2 dg2 dg3 disk4 dg3 dg3 dg4 dg4 dg4 dg5 dg5 dg5 dg6 dg6 disk5 dg6 dg7 dg7 dg7 dg8 dg8 dg8 dg9 dg9 dg9 As MongoDB provides no mechanism to start the sharded cluster automatically, the manual startup procedure was implemented into the YCSB kit specifically for the 1 node cluster. 4.4 SEQUOIADB SequoiaDB was installed by trained professionals according to the official documentation 7. The setup was executed according to the documentation for cluster setups 8. SequoiaDB is able to start all instances through a cluster manager, the preinstalled init script sdbcm could be used to start all services. Three of 4 http://www.datastax.com/documentation//2.1//install/installrhel_t.html 5 http://www.datastax.com/documentation//2.1//install/installrecommendsettings.html 6 http://docs..org/manual/tutorial/deploy-shard-cluster/ 7 http://www..com/en/document/1.8/installation/server_installation/topics/linux_en.html 8 http://www..com/en/document/1.8/installation/configuration_start/topics/cluster_en.html 5

the system s nodes were chosen as catalog nodes. Three instances of SequoiaDB were started on each node, each accessing its own disk. 4.5 YCSB YCSB has several shortcomings. It is not well suited to run multiple YCSB instances on different hosts, long running high OPS workloads, on machines with many physical cores. Furthermore, it is not very actively maintained. bankmark made several extensions and modifications to the.1.4 version in the main repository. Below are the most important changes: Add scripting to automate tests Cassandra driver from jbellis (https://github.com/jbellis/ycsb) MongoDB driver from achille (https://github.com/achille/ycsb) o Add batch insert function (provided by SequoiaDB) o Updated driver interface implementation to MongoDB 2_12 and added property flag to activate "unordered inserts" in batch mode. SequoiaDB driver from SequoiaDB Changes for multi-node setups and bulk load option 5 BENCHMARK SETUP The following generic and specific parameters were chosen for the benchmark run: Ten servers (R52 and R72) hosted the database systems and five blades as clients Use the sixth blade as system running the control script Each database system wrote data to three independent disks All experiments were run with replication factor 3 bankmark s YCSB kit provides workload files according to the tests defined in the statement of work: workload1 warmup Single load Zipfian distribution 1% read workload1 bulk load Zipfian distribution 1% read (1k records) workload2 warmup Single load Zipfian distribution 5% read, 5% update workload2 bulk load (1k records) Zipfian distribution 5% read, 5% insert workload3 warmup Single load Zipfian distribution 5% read, 95% update workload3 bulk load (1k records) Zipfian distribution 5% read, 95% update workload4 warmup Single load Zipfian distribution 95% read, 5% update workload4 bulk load (1k records) Zipfian distribution 95% read, 5% update workload5 warmup Single load latest distribution 95% read, 5% update workload6 bulk load (1k records) latest distribution 95% read, 5% insert 6

For the data load, either the workload[1-5]-warmup or the workload[1-5] file can be used, depending on the desired load type. Each of the five workloads has been divided into a workload file for the final result and a warmup file, which is performed before the measured run. To avoid complications with YCSB s internal handling of accessing records, no inserts were performed during the warmup. Using a thread scaling experiment, we determined that 64 threads per YCSB instance worked best across all systems. These were the remaining parameters used in the test: We used compression where possible Thread count per YCSB clients: 64 Generate o 2 Million (2 Million per node) records for Test Case I and o 1 Million (1 Million per node) records for Test Case II test Each record consists of one key user<id> and ten fields Field<ID>. Default record size of YCSB (1 byte) was used, which results in an average of 1128 Bytes per record (1 fields + field names + key) General benchmarking procedure for each key value store: o Start database servers o Iterate over the five workloads defined in the provided workloads files: Perform the data single load (no time limit, workload file workload[1-5]-warmup) Pause for 3 minutes to give each system time to perform any clean up etc. Run a 3 minutes warmup of the workload (workload file workload[1-5]-warmup) Run workload for 3 minutes (workload file workload[1-5]) o Stop database servers 5.1 GUIDELINES / PROCEDURES All systems performed a single load, a warmup and a measured run. For systems which support a bulk load operation, namely MongoDB and SequoiaDB, an additional bulk load test was performed after the test completed. 5.2 CONFIGURATION MATRIX Database Cassandra MongoDB SequoiaDB Options Nodes 1 instances (1 per node) 1 mongos instances (1 per node) 3 mongod replica instances (3 per node) 3 configuration Servers (every 3 rd node) 1 SequoiaDB instances 3 Replica instances (3 per node) Disks Log: disk 1 Replicas: disk3, disk4, disk5 Replicas: disk3 disk4 disk5 Data: disk5, disk6 Sharding 3 replicas (on db 1 shards with 3 replicas 1 shards with 3 replicas each /Replication creation) each Compression Yes No (not supported) Yes Consistency Read/write/scan/ delete: ONE Read preference: nearest Write concern: Journaled Write concern: Journaled (not changeable) Bulk No Yes (1k records per batch) Yes (1k records per batch) 7

6 BENCHMARK RESULTS 6.1 TEST CASE I (2 MILLION RECORDS / 2 MILLION RECORDS PER NODE) In this experiment, the raw data size is about 45% of the system s total RAM. 6.1.1 Load 8 7 6 5 4 3 2 1 6.1.2 Bulkload (Batches each 1 records) 7 6 5 4 3 2 1 6.1.3 Workload 1, zipfian, 1% read 3 25 2 15 1 5 8

6.1.4 Workload 2, zipfian, 5% read, 5% update 14 12 1 8 6 4 2 6.1.5 Workload 3, zipfian, 5% read, 95% update 8 7 6 5 4 3 2 1 6.1.6 Workload 4, zipfian, 95% read, 5% update 14 12 1 8 6 4 2 9

6.1.7 Workload 5, latest distribution, 95% read, 5% insert 35 3 25 2 15 1 5 6.2 TEST CASE II (1 MILLION RECORDS / 1 MILLION RECORDS PER NODE) In this experiment, the raw data size is about 22% of the system s total RAM. 6.2.1 Load 1 8 6 4 2 6.2.2 Bulkload 8 7 6 5 4 3 2 1 1

6.2.3 Workload 1, zipfian, 1% read 9 8 7 6 5 4 3 2 1 6.2.4 Workload 2, zipfian, 5% read, 5% update 3 25 2 15 1 5 6.2.5 Workload 3, zipfian, 5% read, 95% update 12 1 8 6 4 2 11

6.2.6 Workload 4, zipfian, 95% read, 5% update 7 6 5 4 3 2 1 6.2.7 Workload 5, latest distribution, 95% read, 5% insert 14 12 1 8 6 4 2 7 ABOUT THE AUTHORS Tilmann Rabl is a Postdoctoral fellow at the University of Toronto and CEO of bankmark. His research focusses on big data benchmarking and big data systems. Michael Frank is CTO of bankmark, he is the core developer of the Parallel Data Generation Framework (PDGF), which is basis for industry standard benchmarks. Manuel Danisch is COO of bankmark. He is one of the main developers of the BigBench big data analytics benchmark and the data generator for the Transaction Processing Performance Council s (TPC) benchmark TPC-DI. bankmark is an independent benchmark institution whose mission is to revolutionize big data benchmarking. Driven by innovative technology, bankmark creates performance and quality tests as well as proof of concept systems and completely simulated productive systems efficiently and cost-effectively. Techniques based on cutting edge scientific research allow for unprecedented quality and speed. bankmark is independent member of the industry benchmark standardization organizations SPEC and TPC and its technology is basis for benchmarks such as TPC-DI and BigBench. 12