NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Similar documents
NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Distributed Data Store

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Next-Generation Cloud Platform

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

VoltDB vs. Redis Benchmark

Relational databases

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

CIB Session 12th NoSQL Databases Structures

Chapter 24 NOSQL Databases and Big Data Storage Systems

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

High Performance NoSQL with MongoDB

Introduction to NoSQL Databases

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Challenges for Data Driven Systems

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Why NoSQL? Why Riak?

Non-Relational Databases. Pelle Jakovits

NoSQL : A Panorama for Scalable Databases in Web

CS 655 Advanced Topics in Distributed Systems

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

CompSci 516 Database Systems

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

Cassandra Design Patterns

A Non-Relational Storage Analysis

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

MONGODB INTERVIEW QUESTIONS

Introduction to Graph Databases

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Course Content MongoDB

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

STATE OF MODERN APPLICATIONS IN THE CLOUD

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

NoSQL Databases Analysis

Column-Family Databases Cassandra and HBase

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Database Solution in Cloud Computing

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases

Database Systems CSE 414

A Study of NoSQL Database

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

DATABASE DESIGN II - 1DL400

June 20, 2017 Revision NoSQL Database Architectural Comparison

Big Data Processing Technologies. Chentao Wu Associate Professor Dept. of Computer Science and Engineering

The course modules of MongoDB developer and administrator online certification training:

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

<Insert Picture Here> Oracle NoSQL Database A Distributed Key-Value Store

How do we build TiDB. a Distributed, Consistent, Scalable, SQL Database

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Introduction to the Active Everywhere Database

Introduction to Distributed Data Systems

NoSQL Performance Test

Distributed File Systems II

Stages of Data Processing

Presented by Sunnie S Chung CIS 612

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Distributed Databases: SQL vs NoSQL

NOSQL DATABASES OCTOBER 20, A comparison between the MongoDB, Cassandra, and Redis databases ANDREW HYTE

L22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld

Scaling with mongodb

Fusion iomemory PCIe Solutions from SanDisk and Sqrll make Accumulo Hypersonic

CIT 668: System Architecture. Distributed Databases

April 21, 2017 Revision GridDB Reliability and Robustness

Accelerate Database Performance and Reduce Response Times in MongoDB Humongous Environments with the LSI Nytro MegaRAID Flash Accelerator Card

10 Million Smart Meter Data with Apache HBase

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

MapReduce and Friends

MongoDB An Overview. 21-Oct Socrates

In-Memory Data processing using Redis Database

Design and Analysis of High Performance Crypt-NoSQL

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Introduction to Database Services

Cassandra- A Distributed Database

Performance Analysis of NoSQL Databases with Hadoop Integration

Intro Cassandra. Adelaide Big Data Meetup.

Topics. Big Data Analytics What is and Why Hadoop? Comparison to other technologies Hadoop architecture Hadoop ecosystem Hadoop usage examples

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Hadoop/MapReduce Computing Paradigm

Cloud Computing and Hadoop Distributed File System. UCSB CS170, Spring 2018

A Non-functional evaluation of NoSQL Database Management Systems

Database Architectures

Rule 14 Use Databases Appropriately

Data Clustering on the Parallel Hadoop MapReduce Model. Dimitrios Verraros

Using the SDACK Architecture to Build a Big Data Product. Yu-hsin Yeh (Evans Ye) Apache Big Data NA 2016 Vancouver

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Analysis of HBase Read/Write

HyperDex: Next Generation NoSQL

Transcription:

NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu

Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related to Google s BigTable and Amazon s Dynamo

Purpose - Experiment - Use single, low capacity server to run mix of read/update queries and compare performance - Compare MongoDB and Cassandra - Differences between the two models - Benchmarking - Performance - Understand how execution time is reflected by database size

NoSQL Databases - BASE principle, not ACID principle - Types - Key-Value Store - Document Store - Column-Family - Graph

Types of NoSQL Databases - Key Value Store - Azure Table Storage, Redis - Document Store - MongoDB, CouchDB - Column-family - Cassandra, Accumulo - Graph - Neo4J, Infinite Graph

MongoDB vs Cassandra - Describe MongoDB and Cassandra - Differences between MongoDB and Cassandra

MongoDB - Developed in C++ by 10gen - Document Store - Schemaless - Documents stored in BSON format - 16MB limit - Identification by defined type, not just id - Indexes - Automatic unique index on _id field - Compound indexes - Areas of Use - CMS (Content Management System) system, comment storage

Characteristics of MongoDB - Most important characteristics are durability and concurrency. - Allows creation of replicas - Master-slave - One master and one or more slaves - Master reads and writes while slaves serve as backup. - When Master goes down, Slave with more recent data is promoted to master - Replica members can be configured - Locking - WiredTiger storage engine

Cassandra - Developed in Java by Apache Software Foundation - Column-family store - Similar to relational model - Designed to store large amounts of data and interact with it efficiently - Data can be distributed and stored over clusters - Areas of Use - Banking, Finance, Logging

Characteristics of Cassandra - Most important characteristics are durability, high availability and scalability. - Peer-to-peer - Possible to store petabytes of data - Failed nodes can be replaced quickly - Replication Strategies - Simple - Network Topology - Replication types - Synchronous - Asynchronous - Indexes - Implemented as hidden table

Feature Comparison - Similarities in core properties - Locking, file types, querying, transactions, data storage and operating systems - MongoDB - Better for frequently written data and use of dynamic queries - Queries are written JSON-like - Cassandra - Optimized for storing and interacting with large amounts of data - CQL based on SQL - Main difference - MongoDB is a CP type system - Cassandra is a PA type system

Benchmark - YCSB (Yahoo! Cloud Serving Benchmark) - 6 Workloads - A, B, C, D, E G, H - 32bit VM with Ubuntu/2GB Ram - Windows 7-4GB Ram - Intel Core 2 Quad - Mongo 2.4.3 - Cassandra 1.2.4

Benchmark Process - Three different data sets - 100K, 280K, 700K - 1 Record = 1KB (10 fields) - Each field contains random characters(ex: user1234123 ) - Workload - Executed 3 times (with computer restart after each one) - Average of the 3

Evaluation: Insertion - Mongo 24% faster than Cassandra

Workload A - 50% read and 50% update - Cassandra is ~2.54 faster than Mongo - 700K faster than 280K for both - NoSQL optimized for larger datasets. - Nodes/Clusters

Workload B - 95% read, 5% update - Mongo faster with smaller sets - Cassandra faster with bigger sets

Workload C 100% read - Same behavior as Workload B - Cassandra utilizes MemTable/SSTable - Mongo utilizes memory-mapped file - If not enough RAM, page faults occur - Large data = large # of page faults = slower

Workload F - Read-modify-write - Read file - Modify file - Write it back - Same behavior of Workload B/C

Workload G - 5% read, 95% update - Cassandra super fast - Writes are appendable on Cassandra - Append to end of file - Writes are in place on Mongo - Locate page on disk - Put in memory - Update - Write back to disk

Workload H - 100% update - Same behavior as Workload G

Summary - RDBMS performance slows when apps begin to develop and become more complex - NoSQL popularity is rising and are being integrated into many production products. - Compare / contrast between Mongo and Cassandra - After workload benchmarks, Cassandra seems to be faster for bigger scale applications. - Small set / mostly read - Mongo - Mostly write / big set - Cassandra